Re: a tsearch2 (8.2.4) dictionary that only filters out stopwords

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jan Urbański <j(dot)urbanski(at)students(dot)mimuw(dot)edu(dot)pl>, pgsql-patches(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: a tsearch2 (8.2.4) dictionary that only filters out stopwords
Date: 2007-11-14 17:06:09
Message-ID: Pine.LNX.4.64.0711141950040.7787@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

In principle the right way is to allow any dictionary have option
like 'PassThrough' and internal function get_dict_options(dict, option)
to check if PassThrough option is true.
Let's consider one example - removing accents.
In the past I always recommend people to use regex functions before
to_tsvector conversion to remove accents, but recently I was noticed that
such trick doesn't work with headline(). So, the only way is to have
special dictionary dict_remove_accent before, which works as a filter.

I don't remember why do we left this for future releases, though.

Oleg
On Wed, 14 Nov 2007, Tom Lane wrote:

> This patch:
> http://archives.postgresql.org/pgsql-patches/2007-11/msg00137.php
> seems simple and useful enough that I think we ought to slip it into
> 8.3, even though we are far past feature freeze.
>
> As the "simple" dictionary type stands in CVS HEAD, it is only useful as
> the last dictionary in a stack, since it never passes anything on as
> unrecognized. With the proposed AcceptAll = false option, it could be
> used to filter out some stopwords before feeding tokens to another
> dictionary. While most dictionary types have their own stopword support,
> some of them match stopwords after their own normalization processing,
> and so there's no way to filter on pre-normalized words. That seems
> like a good improvement, even without the specific need-example that
> Jan provided at the start of the thread.
>
> Normally we'd never consider adding a new feature so late in the
> development cycle, but this seems small enough and useful enough
> to make an exception. Comments?
>
> regards, tom lane
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-11-14 17:17:25 Re: a tsearch2 (8.2.4) dictionary that only filters out stopwords
Previous Message Tom Lane 2007-11-14 16:50:55 Re: Fix pg_dump dependency on postgres.h

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2007-11-14 17:17:25 Re: a tsearch2 (8.2.4) dictionary that only filters out stopwords
Previous Message Tom Lane 2007-11-14 16:50:55 Re: Fix pg_dump dependency on postgres.h