From: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
---|---|
To: | Hannes Dorbath <light(at)theendofthetunnel(dot)de> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: TSearch2 / Get all unique lexems |
Date: | 2005-12-08 11:04:03 |
Message-ID: | Pine.GSO.4.63.0512081355160.13553@ra.sai.msu.su |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Thu, 8 Dec 2005, Hannes Dorbath wrote:
> On 07.12.2005 16:13, Oleg Bartunov wrote:
>> hmm, you could dump tsvector column and use awk+sort+uniq
>
> Thanks. I hoped for something possible inside a pl/pgsql proc. I'm trying to
> integrate pg_trgm with Tsearch2. I'm still on my UTF-8 database. Yes I know,
> there is _NO_ UTF-8 support of any kind in Tsearch2 yet, but I got it working
> to a degree that is OK for my application (Created my own stemmer variant,
> ispell dict, affix file etc). The last missing bit is to get a source for
> pg_trgm. I cannot use the the stat() function, because it breaks as soon it
> sees an UTF-8 char.
unless there is some way to ignore errors in utf8 convertation to text
this is a dead-end. stat() function uses text representation.
You have to wait new release with full UTF8 support or go 'lazy' way,
i.e. use any tools to get a list of unique words and create pg_trgm index.
There are several questions:
* Do you actually need to be synchronized with tsvector ?
* Do you need to recognize all words ? I supposed no. In real life you should
have a dictionary which you certainly need to recognize.
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
From | Date | Subject | |
---|---|---|---|
Next Message | Martijn van Oosterhout | 2005-12-08 11:43:11 | Re: memory leak under heavy load? |
Previous Message | Teodor Sigaev | 2005-12-08 11:00:55 | Re: TSearch2 / Get all unique lexems |