Skip site navigation (1) Skip section navigation (2)

default_text_search_config and expression indexes

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paesold <mpaesold(at)gmx(dot)at>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: default_text_search_config and expression indexes
Date: 2007-07-26 22:23:51
Message-ID: 200707262223.l6QMNpo23400@momjian.us (view raw or flat)
Thread:
Lists: pgsql-advocacypgsql-hackers
Oleg Bartunov wrote:
> >> Second, I can't figure out how to reference a non-default
> >> configuration.
> >
> > See the multi-argument versions of to_tsvector etc.
> >
> > I do see a problem with having to_tsvector(config, text) plus
> > to_tsvector(text) where the latter implicitly references a config
> > selected by a GUC variable: how can you tell whether a query using the
> > latter matches a particular index using the former?  There isn't
> > anything in the current planner mechanisms that would make that work.
> 
> Probably, having default text search configuration is not a good idea
> and we could just require it as a mandatory parameter, which could
> eliminate many confusion with selecting text search configuration.

We have to decide if we want a GUC default_text_search_config, and if so
when can it be changed.

Right now there are three ways to create a tsvector (or tsquery)

	::tsvector
	to_tsvector(value)
	to_tsvector(config, value)

(ignoring plainto_tsvector)

Only the last one specifies the configuration. The others use the
configuration specified by default_text_search_config.  (We had an
previous discussion on what the default value of
default_text_search_config should be, and it was decided it should be
set via initdb based on a flag or the locale.)

Now, because most people use a single configuration, they can just set
default_text_search_config and there is no need to specify the
configuration name.

However, expression indexes cause a problem here:

	http://momjian.us/expire/fulltext/HTML/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX

We recommend that users create an expression index on the column they
want to do a full text search on, e.g.

	CREATE INDEX pgweb_idx ON pgweb USING gin(to_tsvector(body));

However, the big problem is that the expressions used in expression
indexes should not change their output based on the value of a GUC
variable (because it would corrupt the index), but in the case above,
default_text_search_config controls what configuration is used, and
hence the output of to_tsvector is changed if default_text_search_config
changes.

We have a few possible options:

	1) Document the problem and do nothing else.
	2) Make default_text_search_config a postgresql.conf-only
	   setting, thereby making it impossible to change by non-super
	   users, or make it a super-user-only setting.
	3) Remove default_text_search_config and require the
	   configuration to be specified in each function call.

If we remove default_text_search_config, it would also make ::tsvector
casting useless as well.

-- 
  Bruce Momjian  <bruce(at)momjian(dot)us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

In response to

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2007-07-27 00:43:45
Subject: Re: stats_block_level
Previous:From: Dave PageDate: 2007-07-26 21:39:55
Subject: Re: stats_block_level

pgsql-advocacy by date

Next:From: Bruce MomjianDate: 2007-07-26 22:24:52
Subject: Re: Linux World at San Francisco
Previous:From: Tatsuo IshiiDate: 2007-07-26 22:01:16
Subject: Re: Linux World at San Francisco

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group