Re: default_text_search_config and expression indexes

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paesold <mpaesold(at)gmx(dot)at>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: default_text_search_config and expression indexes
Date: 2007-07-31 01:25:03
Message-ID: 200707310125.l6V1P3M11437@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-advocacy pgsql-hackers

Bruce Momjian wrote:
> We have to decide if we want a GUC default_text_search_config, and if so
> when can it be changed.
>
> Right now there are three ways to create a tsvector (or tsquery)
>
> ::tsvector
> to_tsvector(value)
> to_tsvector(config, value)
>
> (ignoring plainto_tsvector)
>
> Only the last one specifies the configuration. The others use the
> configuration specified by default_text_search_config. (We had an
> previous discussion on what the default value of
> default_text_search_config should be, and it was decided it should be
> set via initdb based on a flag or the locale.)
>
> Now, because most people use a single configuration, they can just set
> default_text_search_config and there is no need to specify the
> configuration name.
>
> However, expression indexes cause a problem here:
>
> http://momjian.us/expire/fulltext/HTML/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX
>
> We recommend that users create an expression index on the column they
> want to do a full text search on, e.g.
>
> CREATE INDEX pgweb_idx ON pgweb USING gin(to_tsvector(body));
>
> However, the big problem is that the expressions used in expression
> indexes should not change their output based on the value of a GUC
> variable (because it would corrupt the index), but in the case above,
> default_text_search_config controls what configuration is used, and
> hence the output of to_tsvector is changed if default_text_search_config
> changes.
>
> We have a few possible options:
>
> 1) Document the problem and do nothing else.
> 2) Make default_text_search_config a postgresql.conf-only
> setting, thereby making it impossible to change by non-super
> users, or make it a super-user-only setting.
> 3) Remove default_text_search_config and require the
> configuration to be specified in each function call.
>
> If we remove default_text_search_config, it would also make ::tsvector
> casting useless as well.

OK, I just found a case that I think is going to make #3 a requirement
(remove default_text_search_config).

How is a CREATE INDEX ... to_tsvector(col) going to restore from a
pg_dump? I see no way of guaranteeing that the
default_text_search_config is correct on the restore, and in fact I
don't think we have any way of knowing the default_text_search_config
used for the index.

And if we have to require the configuration name in CREATE INDEX, it has
to be used in WHERE, so we might as well just remove the default
capability and always require the configuration name.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Responses

Browse pgsql-advocacy by date

  From Date Subject
Next Message Joshua D. Drake 2007-07-31 02:09:36 Re: [GENERAL] European users mailing list
Previous Message Josh Berkus 2007-07-30 23:38:28 LinuxWorldExpo Page is up

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2007-07-31 02:14:21 Re: Machine available for community use
Previous Message Decibel! 2007-07-31 01:09:46 Re: ascii() for utf8