Quick Links

Re: Can tsearch do some basic text mining

From:	"Phoenix Kiula" <phoenix(dot)kiula(at)gmail(dot)com>
To:	"Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>
Cc:	"Postgres General" <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Can tsearch do some basic text mining
Date:	2007-08-25 01:15:54
Message-ID:	e373d31e0708241815r1505b482ma96c285c721e80d7@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On 25/08/07, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> wrote:
> On Fri, 24 Aug 2007, Phoenix Kiula wrote:
>
> > Hi,
> >
> > We have big blobs of text (average 10,000 characters) in a database,
> > from which we would like to discover the most often repeated words or
> > phrases. Can tsearch be used for this kind of pattern search? I
> > suppose it's Text Mining 101 sort of stuff, nothing complex.
>
> there is stat() function, see
> http://www.sai.msu.su/~megera/wiki/Tsearch_V2_Notes
> for more details.
> It's not fast, so better to save results in a table

Thanks. This seems to give words only. How about phrases? If words are
so slow, I shudder to think how long phrase analysis would take -- it
that is possible at all?

In response to

Re: Can tsearch do some basic text mining at 2007-08-24 17:53:23 from Oleg Bartunov

Browse pgsql-general by date

	From	Date	Subject
Next Message	Tom Lane	2007-08-25 01:18:35	Re: lc_collate issue
Previous Message	Benjamin Arai	2007-08-25 00:41:48	Partioning tsearch2 a table into chunks and accessing via views