Re: tsearch in core patch

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: teodor(at)sigaev(dot)ru
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Hannu Krosing <hannu(at)skype(dot)net>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tsearch in core patch
Date: 2007-06-22 16:18:06
Message-ID: 20070622161806.GP8949@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

teodor(at)sigaev(dot)ru wrote:
> > Why not do it the other way around?
> > es_ES spanish
> > Spanish_Spain spanish
> > ru_RU russian
> > pt_BR portuguese_brazil
> >
> > That way you don't need any funny index. Or do you need the list of
> > locales for each language? (but even if you do, you can easily obtain it
> > by indexing both columns separately using btrees anyway)
>
> Yes, that's possible but that icreases number of identical configuration:
> russian_win Russian_Russia
> russian_unix ru_RU
>
> They doesn't differ except locale name.

But why do you need them to be different at all? Just make it

russian Russian_Russia
russian ru_RU

Does that not work for some reason?

What I was really suggesting was having a table mapping locale names
into "tsearch languages". Then the configuration could be made based on
the language, not on the locale name. So the stopword list is for
"russian", regardless of whether the locale is Russian_Russia or ru_RU.

Is this only for the stopword list, or does it also affect selecting a
stemmer?

Note: it's possible that the stopword list is different for brazilian
portuguese than portuguese portuguese, which is why I was suggesting
using a language "portuguese_brazil" and not just "postuguese". Whereas
you need a single stopword list for all the countries speaking spanish,
which is why you need only one language called spanish.

--
Alvaro Herrera http://www.advogato.org/person/alvherre
"Llegará una época en la que una investigación diligente y prolongada sacará
a la luz cosas que hoy están ocultas" (Séneca, siglo I)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message teodor 2007-06-22 16:20:15 Re: tsearch in core patch
Previous Message Florian G. Pflug 2007-06-22 16:16:34 Re: Worries about delayed-commit semantics