Quick Links

Re: Bunch of tsearch fixes and cleanup

From:	"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Patches" <pgsql-patches(at)postgresql(dot)org>, "Teodor Sigaev" <teodor(at)sigaev(dot)ru>, "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>
Subject:	Re: Bunch of tsearch fixes and cleanup
Date:	2007-08-23 20:30:05
Message-ID:	46CDEE4D.906@enterprisedb.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-patches

Tom Lane wrote:
> Something that was annoying me yesterday was that it was not clear
> whether we had fixed every single place that uses a tsearch config file
> to assume that the file is in UTF8 and should be converted to database
> encoding. So I was thinking of hardwiring the "recode" part into
> readstopwords, and using wordop just for the "lowercase" part, which
> seemed to me like a saner division of labor. That is, UTF8 is a policy
> that we want to enforce globally, but lowercasing maybe not, and this
> still leaves the door open for more processing besides lowercasing.

I think we also want to always run input files through pg_verify_mbstr.
We do it for stopwords, and synonym files (though incorrectly), but not
for thesaurus files or ispell files. It's probably best to do that
within the recode-function as well.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Re: Bunch of tsearch fixes and cleanup at 2007-08-23 14:49:27 from Tom Lane

Responses

Re: Bunch of tsearch fixes and cleanup at 2007-08-24 11:39:52 from Heikki Linnakangas

Browse pgsql-patches by date

	From	Date	Subject
Next Message	Joshua D. Drake	2007-08-23 20:34:54	Re: pg_ctl configurable timeout
Previous Message	Zdenek Kotala	2007-08-23 19:50:01	Re: pg_ctl configurable timeout