Re: Yet another problem with ILIKE and UTF-8

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "Gergely Bor" <borg42(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: Yet another problem with ILIKE and UTF-8
Date: 2007-10-25 16:35:08
Message-ID: 17636.1193330108@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> "Gergely Bor" <borg42(at)gmail(dot)com> writes:
>> Environment B: Debian lenny/sid ^[1], kernel version 2.6.20.1, glibc
>> 2.6.1-5, psql 8.2.5, lc_* is hu_HU, all encondings (client, server,
>> DB) are UTF-8.

> I'm not sure this is the right answer but what happens if you initdb a
> database on the Debian box with lc_* set to hu_HU.UTF-8 ?

On my Fedora Core 6 machine, the encoding implied by LANG=hu_HU
seems to be LATIN2, not UTF8. It's possible that Debian's glibc
does this differently than Fedora's, but not real likely.
So I think Greg has probably identified the problem correctly:
you have a locale-vs-encoding mismatch on the Debian setup.

FWIW, 8.3 will reject this sort of misconfiguration:

$ LANG=hu_HU initdb -E utf8
The files belonging to this database system will be owned by user "tgl".
This user must also own the server process.

The database cluster will be initialized with locale hu_HU.
initdb: encoding mismatch
The encoding you selected (UTF8) and the encoding that the
selected locale uses (LATIN2) do not match. This would lead to
misbehavior in various character string processing functions.
Rerun initdb and either do not specify an encoding explicitly,
or choose a matching combination.

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2007-10-25 16:39:01 Re: BUG #3697: utf8 issue: can not reimport a table that was successfully exported.
Previous Message Gregory Stark 2007-10-25 16:33:12 Re: Yet another problem with ILIKE and UTF-8