Hello Gregory, hello all,
I've tested the DB created with hu_HU.UTF8 lc_* settings and it works
like a charm! We'll re-create the database and move the contents. It's
a nice improvement of 8.3 to detect and disallow such
Summing up the issue & fix (for googling):
- Version: postgresql 8.2.x (and possibly older ones as well)
- Symptom: UTF-8 values are stored correctly but some ILIKE queries
don't return results as expected, even when searching for ASCII
- Reason: database misconfiguration -> UPPER()/LOWER() cannot handle
UTF-8 chars -> ILIKE cannot handle UTF-8 chars. The error above occurs
if the DB is created with UTF-8 internal encoding, but lc_* settings
are not referring UTF-8 locales. (Eg. bad locale reference: "hu_HU",
good locale reference: "hu_HU.UTF8".)
- Fix: a new database (or DB cluster?) should be created with the
corrected lc_* settings. If there's data already, it should be dumped
and restored (see steps below).
- Additional info: lc_* settings (and a whole lot of others) can be
displayed with the psql command "SHOW ALL".
- Additional info: in case the DB is configured incorrectly, psql 8.2
just fails to match rows correctly (without any errors), but 8.3+ will
reject this bad configuration in time.
Thanks a lot, guys!
On 10/25/07, Gregory Stark <stark(at)enterprisedb(dot)com> wrote:
> "Gergely Bor" <borg42(at)gmail(dot)com> writes:
> > We'll google the initdb stuff and try it ASAP.
> > What I've tried is LOWER and UPPER, and they seem to return trash for
> > Hungarian UTF-8 characters, but they handle ASCII well. (Hmmmm...
> > maybe ILIKE requires LOWER and UPPER to work? Would not be
> > illogical...)
> It does. I think it works by just downcasing both strings. It's possible to do
> better but tricky. I think 8.3 has an optimization for that for single-byte
> encodings but it had to be disabled for utf-8 in the end.
> If it's returning trash for those characters then it's not prepared to handle
> UTF-8 data. You have to use an encoding compatible with your locale and
> If you want to store UTF-8 data I suggest you
> . add hu_HU.UTF-8 to /etc/locale.gen,
> . rerun /usr/sbin/locale-gen
> . pg_dump your database
> . re-initdb with the locale set to hu_HU.UTF-8
> . pg_restore your data.
> Unfortunately that'll take quite a while and involve down-time.
> You should probably do this in a second directory aside from your existing
> database just in case you've created any invalidly encoded utf-8 strings.
> You'll have to fix them before restoring. (Actually I don't recall which
> version got strict about that.)
> Gregory Stark
> EnterpriseDB http://www.enterprisedb.com
In response to
pgsql-bugs by date
|Next:||From: Tom Lane||Date: 2007-10-26 14:29:02|
|Subject: Re: Possible planner bug/regression introduced in 8.2.5 |
|Previous:||From: Lee Packham||Date: 2007-10-26 14:08:21|
|Subject: Re: BUG #3699: Fails to compile DTrace Support|