default_text_search_config and expression indexes

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paesold <mpaesold(at)gmx(dot)at>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: default_text_search_config and expression indexes
Date: 2007-07-26 22:23:51
Message-ID: 200707262223.l6QMNpo23400@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-advocacy pgsql-hackers

Oleg Bartunov wrote:
> >> Second, I can't figure out how to reference a non-default
> >> configuration.
> >
> > See the multi-argument versions of to_tsvector etc.
> >
> > I do see a problem with having to_tsvector(config, text) plus
> > to_tsvector(text) where the latter implicitly references a config
> > selected by a GUC variable: how can you tell whether a query using the
> > latter matches a particular index using the former? There isn't
> > anything in the current planner mechanisms that would make that work.
>
> Probably, having default text search configuration is not a good idea
> and we could just require it as a mandatory parameter, which could
> eliminate many confusion with selecting text search configuration.

We have to decide if we want a GUC default_text_search_config, and if so
when can it be changed.

Right now there are three ways to create a tsvector (or tsquery)

::tsvector
to_tsvector(value)
to_tsvector(config, value)

(ignoring plainto_tsvector)

Only the last one specifies the configuration. The others use the
configuration specified by default_text_search_config. (We had an
previous discussion on what the default value of
default_text_search_config should be, and it was decided it should be
set via initdb based on a flag or the locale.)

Now, because most people use a single configuration, they can just set
default_text_search_config and there is no need to specify the
configuration name.

However, expression indexes cause a problem here:

http://momjian.us/expire/fulltext/HTML/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX

We recommend that users create an expression index on the column they
want to do a full text search on, e.g.

CREATE INDEX pgweb_idx ON pgweb USING gin(to_tsvector(body));

However, the big problem is that the expressions used in expression
indexes should not change their output based on the value of a GUC
variable (because it would corrupt the index), but in the case above,
default_text_search_config controls what configuration is used, and
hence the output of to_tsvector is changed if default_text_search_config
changes.

We have a few possible options:

1) Document the problem and do nothing else.
2) Make default_text_search_config a postgresql.conf-only
setting, thereby making it impossible to change by non-super
users, or make it a super-user-only setting.
3) Remove default_text_search_config and require the
configuration to be specified in each function call.

If we remove default_text_search_config, it would also make ::tsvector
casting useless as well.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Responses

Browse pgsql-advocacy by date

  From Date Subject
Next Message Bruce Momjian 2007-07-26 22:24:52 Re: Linux World at San Francisco
Previous Message Tatsuo Ishii 2007-07-26 22:01:16 Re: Linux World at San Francisco

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-07-27 00:43:45 Re: stats_block_level
Previous Message Dave Page 2007-07-26 21:39:55 Re: stats_block_level