Re: a tsearch2 (8.2.4) dictionary that only filters out stopwords

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Jan Urbański <j(dot)urbanski(at)students(dot)mimuw(dot)edu(dot)pl>, pgsql-patches(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: a tsearch2 (8.2.4) dictionary that only filters out stopwords
Date: 2007-11-14 17:17:25
Message-ID: 25604.1195060645@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> writes:
> Let's consider one example - removing accents.
> In the past I always recommend people to use regex functions before
> to_tsvector conversion to remove accents, but recently I was noticed that
> such trick doesn't work with headline(). So, the only way is to have
> special dictionary dict_remove_accent before, which works as a filter.

> I don't remember why do we left this for future releases, though.

That would require a system-to-dictionary API change (to be able to
modify the token under inspection), no? So it's certainly something
I'd say is too late for 8.3.

One thought that came to mind is that the option name should be just
"Accept" not "AcceptAll". To me "All" implies that it would accept
*everything* ... including stopwords.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Oleg Bartunov 2007-11-14 17:29:40 Re: a tsearch2 (8.2.4) dictionary that only filters out stopwords
Previous Message Oleg Bartunov 2007-11-14 17:06:09 Re: a tsearch2 (8.2.4) dictionary that only filters out stopwords

Browse pgsql-patches by date

  From Date Subject
Next Message Oleg Bartunov 2007-11-14 17:29:40 Re: a tsearch2 (8.2.4) dictionary that only filters out stopwords
Previous Message Oleg Bartunov 2007-11-14 17:06:09 Re: a tsearch2 (8.2.4) dictionary that only filters out stopwords