Skip site navigation (1) Skip section navigation (2)

Re: Bunch of tsearch fixes and cleanup

From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Patches" <pgsql-patches(at)postgresql(dot)org>,"Teodor Sigaev" <teodor(at)sigaev(dot)ru>,"Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>
Subject: Re: Bunch of tsearch fixes and cleanup
Date: 2007-08-23 20:30:05
Message-ID: 46CDEE4D.906@enterprisedb.com (view raw or flat)
Thread:
Lists: pgsql-patches
Tom Lane wrote:
> Something that was annoying me yesterday was that it was not clear
> whether we had fixed every single place that uses a tsearch config file
> to assume that the file is in UTF8 and should be converted to database
> encoding.  So I was thinking of hardwiring the "recode" part into
> readstopwords, and using wordop just for the "lowercase" part, which
> seemed to me like a saner division of labor.  That is, UTF8 is a policy
> that we want to enforce globally, but lowercasing maybe not, and this
> still leaves the door open for more processing besides lowercasing.

I think we also want to always run input files through pg_verify_mbstr.
We do it for stopwords, and synonym files (though incorrectly), but not
for thesaurus files or ispell files. It's probably best to do that
within the recode-function as well.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

In response to

Responses

pgsql-patches by date

Next:From: Joshua D. DrakeDate: 2007-08-23 20:34:54
Subject: Re: pg_ctl configurable timeout
Previous:From: Zdenek KotalaDate: 2007-08-23 19:50:01
Subject: Re: pg_ctl configurable timeout

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group