Re: tsearch in core patch

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Hannu Krosing <hannu(at)skype(dot)net>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tsearch in core patch
Date: 2007-06-22 15:02:37
Message-ID: Pine.LNX.4.64.0706221856340.1881@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 22 Jun 2007, Bruce Momjian wrote:

> Tom Lane wrote:
>> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>>> I very much doubt that the different spanishes are any different in the
>>> stemming rules, so there's no need for es_ES, es_PE, es_AR, es_CL etc;
>>> but in the case of portuguese I'm not so sure. Maybe there are other
>>> examples (like chinese, but I'm not sure how useful is tsearch for
>>> chinese).
>>
>>> And the .ISO8859-1 part you don't need at all if you accept that the
>>> files are UTF8 by design, as Tom proposed.
>>
>> Also, the problem we're dealing with here is mainly lack of
>> standardization of the encoding part of locale names. AFAIK, just about
>> everybody agrees on "es_ES", "ru_RU", etc; it's the part that comes
>> after that (if any) that is not too consistent across platforms.
>> So I see no problem in distinguishing between pt_PT and pt_BR if it
>> turns out we have to. The trick is to not look at any more of the
>> locale name than that; and if we standardize on "stopword files are
>> UTF8" then I don't think we need to.
>
> OK, and the open question is when do we do this default setting. If we
> do it in initdb then we can isolate all the detection there.

We can do that at initdb time, but we still have to decide how to map
human-readable language name and lang part of locale name. Are we going
to hardcode it ?

It's not friendly for hosting solution, when people often have no access
to the postgresql.conf, so they need to remember setting tsearch_conf_name.
It could be solved using 'alter user ... set tsearch_conf_name' command though.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2007-06-22 15:03:28 Re: tsearch in core patch
Previous Message Tom Lane 2007-06-22 15:01:02 Re: Worries about delayed-commit semantics