Skip site navigation (1) Skip section navigation (2)

Re: Enforcing database encoding and locale match

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Enforcing database encoding and locale match
Date: 2007-09-28 22:58:53
Message-ID: 21237.1191020333@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-hackers
Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM> writes:
> On Solaris I got following problematic locales: [...]

I tried this program on Mac OS X 10.4.10 (the current release) and found
out that what that OS mostly returns is the encoding portion of the
locale name, for instance

sv_SE.ISO8859-15        ... ISO8859-15 - OK
sv_SE.UTF-8             ... UTF-8      - OK
tr_TR                   ...            - NO MATCH
tr_TR.ISO8859-9         ... ISO8859-9  - OK
tr_TR.UTF-8             ... UTF-8      - OK
uk_UA                   ...            - NO MATCH
uk_UA.ISO8859-5         ... ISO8859-5  - OK
uk_UA.KOI8-U            ... KOI8-U     - NO MATCH
uk_UA.UTF-8             ... UTF-8      - OK
zh_CN                   ...            - NO MATCH
zh_CN.eucCN             ... eucCN      - OK
zh_CN.GB18030           ... GB18030    - NO MATCH
zh_CN.GB2312            ... GB2312     - OK
zh_CN.GBK               ... GBK        - NO MATCH
zh_CN.UTF-8             ... UTF-8      - OK
zh_HK                   ...            - NO MATCH
zh_HK.Big5HKSCS         ... Big5HKSCS  - NO MATCH
zh_HK.UTF-8             ... UTF-8      - OK
zh_TW                   ...            - NO MATCH
zh_TW.Big5              ... Big5       - NO MATCH
zh_TW.UTF-8             ... UTF-8      - OK
C                       ... US-ASCII   - NO MATCH
POSIX                   ... US-ASCII   - NO MATCH

They didn't *quite* hard-wire it that way, as evidenced by the C/POSIX
results, but certainly the empty-string results are entirely useless.
Perhaps we should file a bug with Apple.  However, some poking around
in /usr/share/locale indicates that there's a consistent interpretation
to be made:

g42:/usr/share/locale tgl$ ls -l ??_??/LC_CTYPE
lrwxr-xr-x   1 root  wheel    17 Apr 26  2006 af_ZA/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
-r--r--r--   1 root  wheel  3272 Mar 20  2005 am_ET/LC_CTYPE
lrwxr-xr-x   1 root  wheel    17 Apr 26  2006 be_BY/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
lrwxr-xr-x   1 root  wheel    17 Apr 26  2006 bg_BG/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
lrwxr-xr-x   1 root  wheel    17 Apr 26  2006 ca_ES/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
lrwxr-xr-x   1 root  wheel    17 Apr 26  2006 cs_CZ/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
lrwxr-xr-x   1 root  wheel    17 Apr 26  2006 da_DK/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
lrwxr-xr-x   1 root  wheel    17 Apr 26  2006 de_AT/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
lrwxr-xr-x   1 root  wheel    17 Apr 26  2006 de_CH/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
lrwxr-xr-x   1 root  wheel    17 Apr 26  2006 de_DE/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
lrwxr-xr-x   1 root  wheel    17 Apr 26  2006 el_GR/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
lrwxr-xr-x   1 root  wheel    17 Apr 26  2006 en_AU/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
lrwxr-xr-x   1 root  wheel    17 Apr 26  2006 en_CA/LC_CTYPE@ -> ../UTF-8/LC_CTYPE
(etc etc)

The only one that's not actually a symlink to the standard UTF-8 ctype
is am_ET/LC_CTYPE, which is identical to am_ET.UTF-8/LC_CTYPE.
So I think we can get away with something like

#ifdef __darwin__
	if (strlen(sys) == 0)
		// assume UTF8
#endif

I suppose we'll need a few more hacks like this as the beta-test results
begin to roll in ...

			regards, tom lane

In response to

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2007-09-28 23:45:37
Subject: Re: Enforcing database encoding and locale match
Previous:From: Bruce MomjianDate: 2007-09-28 22:24:42
Subject: Re: [HACKERS] Add function for quote_qualified_identifier?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group