Re: Enforcing database encoding and locale match

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Enforcing database encoding and locale match
Date: 2007-09-28 22:58:53
Message-ID: 21237.1191020333@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM> writes:
> On Solaris I got following problematic locales: [...]

I tried this program on Mac OS X 10.4.10 (the current release) and found
out that what that OS mostly returns is the encoding portion of the
locale name, for instance

sv_SE.ISO8859-15 ... ISO8859-15 - OK
sv_SE.UTF-8 ... UTF-8 - OK
tr_TR ... - NO MATCH
tr_TR.ISO8859-9 ... ISO8859-9 - OK
tr_TR.UTF-8 ... UTF-8 - OK
uk_UA ... - NO MATCH
uk_UA.ISO8859-5 ... ISO8859-5 - OK
uk_UA.KOI8-U ... KOI8-U - NO MATCH
uk_UA.UTF-8 ... UTF-8 - OK
zh_CN ... - NO MATCH
zh_CN.eucCN ... eucCN - OK
zh_CN.GB18030 ... GB18030 - NO MATCH
zh_CN.GB2312 ... GB2312 - OK
zh_CN.GBK ... GBK - NO MATCH
zh_CN.UTF-8 ... UTF-8 - OK
zh_HK ... - NO MATCH
zh_HK.Big5HKSCS ... Big5HKSCS - NO MATCH
zh_HK.UTF-8 ... UTF-8 - OK
zh_TW ... - NO MATCH
zh_TW.Big5 ... Big5 - NO MATCH
zh_TW.UTF-8 ... UTF-8 - OK
C ... US-ASCII - NO MATCH
POSIX ... US-ASCII - NO MATCH

They didn't *quite* hard-wire it that way, as evidenced by the C/POSIX
results, but certainly the empty-string results are entirely useless.
Perhaps we should file a bug with Apple. However, some poking around
in /usr/share/locale indicates that there's a consistent interpretation
to be made:

g42:/usr/share/locale tgl$ ls -l ??_??/LC_CTYPE
lrwxr-xr-x 1 root wheel 17 Apr 26 2006 af_ZA/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
-r--r--r-- 1 root wheel 3272 Mar 20 2005 am_ET/LC_CTYPE
lrwxr-xr-x 1 root wheel 17 Apr 26 2006 be_BY/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
lrwxr-xr-x 1 root wheel 17 Apr 26 2006 bg_BG/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
lrwxr-xr-x 1 root wheel 17 Apr 26 2006 ca_ES/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
lrwxr-xr-x 1 root wheel 17 Apr 26 2006 cs_CZ/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
lrwxr-xr-x 1 root wheel 17 Apr 26 2006 da_DK/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
lrwxr-xr-x 1 root wheel 17 Apr 26 2006 de_AT/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
lrwxr-xr-x 1 root wheel 17 Apr 26 2006 de_CH/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
lrwxr-xr-x 1 root wheel 17 Apr 26 2006 de_DE/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
lrwxr-xr-x 1 root wheel 17 Apr 26 2006 el_GR/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
lrwxr-xr-x 1 root wheel 17 Apr 26 2006 en_AU/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
lrwxr-xr-x 1 root wheel 17 Apr 26 2006 en_CA/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
(etc etc)

The only one that's not actually a symlink to the standard UTF-8 ctype
is am_ET/LC_CTYPE, which is identical to am_ET.UTF-8/LC_CTYPE.
So I think we can get away with something like

#ifdef __darwin__
if (strlen(sys) == 0)
// assume UTF8
#endif

I suppose we'll need a few more hacks like this as the beta-test results
begin to roll in ...

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-09-28 23:45:37 Re: Enforcing database encoding and locale match
Previous Message Bruce Momjian 2007-09-28 22:24:42 Re: [HACKERS] Add function for quote_qualified_identifier?