From: | "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> |
---|---|
To: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | "Patches" <pgsql-patches(at)postgresql(dot)org>, "Teodor Sigaev" <teodor(at)sigaev(dot)ru>, "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su> |
Subject: | Re: Bunch of tsearch fixes and cleanup |
Date: | 2007-08-23 14:57:00 |
Message-ID: | 46CDA03C.2010703@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-patches |
Tom Lane wrote:
> "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
>> - readstopwords calls recode_and_lowerstr directly, instead of using the
>> "wordop" function pointer in StopList struct. All callers used
>> recode_and_lowerstr anyway, so this simplifies the code a little bit. Is
>> there any external dictionary implementations that would require
>> different behavior?
>
> I don't think eliminating wordop altogether is such a hot idea; some
> dictionary could possibly want to do different processing than that.
Ok.
> Something that was annoying me yesterday was that it was not clear
> whether we had fixed every single place that uses a tsearch config file
> to assume that the file is in UTF8 and should be converted to database
> encoding.
I'm afraid there's still a lot of inconsistencies in that. I'm just
looking at dict_synonym, and it looks like it has the same problem I
patched in readstopwords; it's using pg_verifymbstr, with database
encoding, to verify the input file. It also seems to be calling
pg_mblen, which depends on database encoding, against UTF-8 encoded
strings. I'll look at those more closely..
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2007-08-23 16:17:23 | Re: BUG #3571: call to decrypt causes segfault |
Previous Message | Tom Lane | 2007-08-23 14:49:27 | Re: Bunch of tsearch fixes and cleanup |