Re: default_text_search_config and expression indexes

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Mike Rylander" <mrylander(at)gmail(dot)com>
Cc: "Bruce Momjian" <bruce(at)momjian(dot)us>, "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Ron Mayer" <rm_pg(at)cheapcomplexdevices(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default_text_search_config and expression indexes
Date: 2007-08-14 22:17:19
Message-ID: 87fy2loj68.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-advocacy pgsql-hackers

"Mike Rylander" <mrylander(at)gmail(dot)com> writes:

> My application (http://open-ils.org, which run >80% of the public
> libraries in Georgia, USA, http://gapines.org and
> http://georgialibraries.org/lib/pines.html) requires that I be able to
> search a corpus of bibliographic records in a mix of languages, and
> potentially with mixed stop-word rules, with one query. I cannot know
> ahead of time what languages will be used in the corpus and I cannot
> restrict any one query to one language. To accomplish this, the
> record itself will be inspected inside an INSERT/UPDATE trigger to
> determine the language and type, and use the correct configuration for
> creating the tsvector. This will obviously result in a "mixed"
> tsvector column, but that's exactly what I need. I can filter on
> record language if the user happens to specify a query language (and
> thus configuration), or simply rank the assumed (IP based, perhaps, or
> browser preference based) preferred language higher, or one of a
> hundred other things. But I won't be able to do any of that if
> tsvectors are required to have one and only one configuration per
> column.
>
> Anyway, I felt I needed to provide some outside perspective to this,
> as a user, since it seems that the external viewpoint (my particular
> viewpoint, at least) was missing from the discussion.

This is *extremely* useful. I think it's precisely what we've been missing so
far. At least, what I've been missing.

So the question is what exactly happens in this case? If I search for "the"
does that mean it will ignore matches in English where that's a stop-word but
find me books on tea in French? Is that what I should expect to happen? What
if I search for "earl and the"? Does that find me French books on Early Grey
Tea but English books on all earls?

What happens if I use the same operator directly on the text column? Or
perhaps it's not even possible to specify stop-words when operating on a text
column? Should it be?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-advocacy by date

  From Date Subject
Next Message Decibel! 2007-08-14 22:25:52 Re: 12 Silver Bullets
Previous Message Tom Lane 2007-08-14 21:58:21 Re: default_text_search_config and expression indexes

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-08-14 22:49:53 Re: CVS corruption/mistagging?
Previous Message Tom Lane 2007-08-14 22:15:09 Re: CVS corruption/mistagging?