Re: tsearch2 in postgresql 8.3.1 - invalid byte sequence for encoding "UTF8": 0xc3

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Richard Huxton <dev(at)archonet(dot)com>
Cc: patrick <patrick(at)11h11(dot)com>, pgsql-hackers(at)postgreSQL(dot)org, PG-General Mailing List <pgsql-general(at)postgreSQL(dot)org>
Subject: Re: tsearch2 in postgresql 8.3.1 - invalid byte sequence for encoding "UTF8": 0xc3
Date: 2008-03-19 23:55:40
Message-ID: 22967.1205970940@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Richard Huxton <dev(at)archonet(dot)com> writes:
> Missed the mailing list on the last reply
>> patrick wrote:
>>> thoses queries are not working, same message:
>>> ERROR: invalid byte sequence for encoding "UTF8": 0xc3
>>>
>>> what i found is in postgresql.conf if i change:
>>> default_text_search_config from pg_catalog.french to
>>> pg_catalog.english then the query is working fine.

I am just about convinced the problem is with french.stop.

There is more to that error message than meets the eye: 0xc3 is a valid
first byte for a two-byte UTF8 character, so the only way that the
message would look just like that is if 0xc3 is the last byte in the
presented string. Looking at french.stop, the only plausible place for
this to happen is the line

(that's \303\240 or 0xc3 0xa0). I am thinking that something decided
the \240 was junk and removed it.

I wonder whether the dictionaries ought not be reading their data files
in binary mode. They appear to all be using AllocateFile(filename, "r")
which means that we're at the mercy of whatever text-mode conversion
Windows feels like doing.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tri Quach 2008-03-20 00:22:53 rpmbuild: command not found
Previous Message Klint Gore 2008-03-19 23:06:11 Re: Problem with async notifications of table updates

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2008-03-20 00:05:25 Re: [COMMITTERS] pgsql: Enable probes to work with Mac OS X Leopard and other OSes that
Previous Message Omar Bettin 2008-03-19 22:57:16 diabolic state