Re: equal operator fails on two identical strings if initdb

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kent Tong <kent(at)cpttm(dot)org(dot)mo>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: equal operator fails on two identical strings if initdb
Date: 2004-11-25 03:52:56
Message-ID: 24244.1101354776@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Kent Tong <kent(at)cpttm(dot)org(dot)mo> writes:
> You mean the OS fails to convert unicode strings to Big5 or the
> OS assumes the bytes are already in Big5?

The latter.

> It is the locale used for initdb or the default system locale
> set in Windows that is used by the collation routines that you
> mentioned above?

The former.

The real problem here, IMHO, is that Postgres allows you to select a
"database encoding" setting that is different from the encoding implied
by the initdb locale (ie, the LC_CTYPE setting). If you make this
mistake, PG will carefully store data byte sequences in the specified
"database encoding" ... and then pass them to strcoll() for comparison
... and strcoll() will assume that the data is in the encoding
associated with LC_CTYPE.

This is partially bad design on our part (we should really not have
invented a per-database encoding selection when the locale setting is
not per-database) and partially bad design on the part of the C standard
(which doesn't provide any very sane way to find out what encoding is
implied by an LC_CTYPE setting).

I think the only real fix is to abandon the C library's locale routines
and find or write our own library with a better API. This has been on
the TODO list for a long time but no one's quite wished to face up to
doing it ...

In the meantime, make sure your encoding setting agrees with the
LC_CTYPE value that initdb used.

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Sean Chittenden 2004-11-25 04:53:11 Re: Stack not being popped correctly (was: Re: [HACKERS] plpgsql lacks generic identifier for record in triggers...)
Previous Message Kent Tong 2004-11-25 03:44:32 Re: equal operator fails on two identical strings if initdb