From: | Euler Taveira de Oliveira <euler(at)timbira(dot)com> |
---|---|
To: | Alvaro Herrera <alvherre(at)commandprompt(dot)com> |
Cc: | teodor(at)sigaev(dot)ru, Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Hannu Krosing <hannu(at)skype(dot)net>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: tsearch in core patch |
Date: | 2007-06-23 16:04:38 |
Message-ID: | 467D4496.4000208@timbira.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Alvaro Herrera wrote:
> What I was really suggesting was having a table mapping locale names
> into "tsearch languages". Then the configuration could be made based on
> the language, not on the locale name. So the stopword list is for
> "russian", regardless of whether the locale is Russian_Russia or ru_RU.
>
Agreed. But I'm afraid we couldn't map all of the locale names in a
right way. Man, it's a large list. ;)
> Is this only for the stopword list, or does it also affect selecting a
> stemmer?
>
Both.
> Note: it's possible that the stopword list is different for brazilian
> portuguese than portuguese portuguese, which is why I was suggesting
> using a language "portuguese_brazil" and not just "postuguese". Whereas
> you need a single stopword list for all the countries speaking spanish,
> which is why you need only one language called spanish.
>
Indeed it's possible for portuguese, because we have some words that are
written in different ways, e.g.,
pt_BR pt_PT english
Mônica Mónica Monica
ação acção action
Irã Irão Iran
.
.
.
Will it be possible to disable stemming or stopwords removal? I'm asking
this 'cause sometimes stemming doesn't lead to good results and/or
stopwords are relevant. Maybe it could be an GUC variables
('enable_stemming' and 'enable_stopwords').
--
Euler Taveira de Oliveira
http://www.timbira.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Magnus Hagander | 2007-06-23 16:14:23 | Re: Bugtraq: Having Fun With PostgreSQL |
Previous Message | Tom Lane | 2007-06-23 16:02:43 | Re: Bugtraq: Having Fun With PostgreSQL |