Re: How does the tsearch configuration get selected?

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Michael Paesold <mpaesold(at)gmx(dot)at>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: How does the tsearch configuration get selected?
Date: 2007-06-15 16:07:34
Message-ID: 4672B946.3040809@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-advocacy pgsql-hackers

> Hm, are you trying to say that it's sane to have different tsvectors in
> a column computed under different language settings? Maybe we're all

Yes, I think so.

That might have sense for close languages. Norwegian languages has two dialects
and one of them has advanced rules for compound words, russian and ukranian has
similar rules etc. Operation @@ is language (and encoding) independent, it use
just strcmp call.

Most often usecase for mixing configuration is somewhere described by me in
thread using two different configuration for indexing (tsvector creation) and
search (tsquery creation). BTW, thesaurus dictionary could be used for similar
reasons in search only configuration.

OpenFTS doesn't use tsearch2 configuration at all, it has such infrastructure
itself - so, tsvector shouldn't have any information about configuration.

Most often change of configuration is a adding new stop words, which doesn't
affect correctness of search. Removing stop words cause impossibility to find
already indexed documents with query contains only removed stop-words.

> overthinking the problem. If the tsvector representation is presumed
> language-independent then I could see this being a workable approach.

Actually, we should allow to only 'compatible' changes of configuration but it
very hard (or even impossible) to formulate rules about that. Any dictionary has
its specific dictinitoption changes to become incompatible with itself, the
same is to compatibility between two dictionaries, list of dictionaries.

In practice, we didn't see any disasters after changes in configuration - until
reindexing search becomes less punctual.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

In response to

Responses

Browse pgsql-advocacy by date

  From Date Subject
Next Message Teodor Sigaev 2007-06-15 16:26:37 Re: How does the tsearch configuration get selected?
Previous Message Gregory Stark 2007-06-15 15:57:05 Re: How does the tsearch configuration get selected?

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-06-15 16:14:45 Rethinking user-defined-typmod before it's too late
Previous Message Gregory Stark 2007-06-15 15:57:05 Re: How does the tsearch configuration get selected?