Re: How does the tsearch configuration get selected?

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Michael Paesold <mpaesold(at)gmx(dot)at>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: How does the tsearch configuration get selected?
Date: 2007-06-15 04:00:10
Message-ID: Pine.LNX.4.64.0706150745090.1881@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-advocacy pgsql-hackers

On Thu, 14 Jun 2007, Tom Lane wrote:

> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>> First, why are we specifying the server locale here since it never
>> changes:

server's locale is used just for one purpose - to select what text search
configuration to use by default. Any text search functions can accept
text search configuration as an optional parameter.

>
> It's poorly described. What it should really say is the language
> that the text-to-be-searched is in. We can actually support multiple
> languages here today, the restriction being that there have to be
> stemmer instances for the languages with the database encoding you're
> using. With UTF8 encoding this isn't much of a restriction. We do need
> to put code into the dictionary stuff to enforce that you can't use a
> stemmer when the database encoding isn't compatible with it.
>
> I would prefer that we not drive any of this stuff off the server's
> LC_xxx settings, since as you say that restricts things to just one
> locale.

something like
CREATE TEXT SEARCH DICTIONARY dictname [LOCALE=ru_RU.UTF-8]
and raise warning/error if database encoding doesn't match dictionary
encoding if specified (not all dictionaries depend on encoding, so it
should be an optional parameter).

>
>> Second, I can't figure out how to reference a non-default
>> configuration.
>
> See the multi-argument versions of to_tsvector etc.
>
> I do see a problem with having to_tsvector(config, text) plus
> to_tsvector(text) where the latter implicitly references a config
> selected by a GUC variable: how can you tell whether a query using the
> latter matches a particular index using the former? There isn't
> anything in the current planner mechanisms that would make that work.

Probably, having default text search configuration is not a good idea
and we could just require it as a mandatory parameter, which could
eliminate many confusion with selecting text search configuration.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Responses

Browse pgsql-advocacy by date

  From Date Subject
Next Message Liam O'Duibhir 2007-06-15 04:23:08 The Business Case for PostgreSQL
Previous Message Tom Lane 2007-06-15 03:39:35 Re: How does the tsearch configuration get selected?

Browse pgsql-hackers by date

  From Date Subject
Next Message Oleg Bartunov 2007-06-15 04:46:35 Re: tsearch_core patch: permissions and security issues
Previous Message Tom Lane 2007-06-15 03:39:35 Re: How does the tsearch configuration get selected?