Chinese initdb on Windows

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Chinese initdb on Windows
Date: 2011-03-21 19:29:57
Message-ID: 4D87A735.2030002@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On windows, if you have OS locale set to "Chinese (Simplified, PRC)",
initdb fails:

X:\>C:\pgsql-install\bin\initdb.exe -D data2
The files belonging to this database system will be owned by user "Heikki".
This user must also own the server process.

The database cluster will be initialized with locale Chinese
(Simplified)_People
's Republic of China.936.
initdb: locale Chinese (Simplified)_People's Republic of China.936
requires unsu
pported encoding GBK
Encoding GBK is not allowed as a server-side encoding.
Rerun initdb with a different locale selection.

The easy workaround for that is to specify --encoding=UTF-8, as UTF-8
can be used with any locale on Windows. How about doing that
automatically in initdb? Now that we have the smarts in psql to detect
current encoding from the environment and set client_encoding
accordingly, it Just Works. Attached is a patch for that.

Once you get past that, however, there's another issue:

> ...
> creating directory data2 ... ok
> creating subdirectories ... ok
> selecting default max_connections ... 100
> selecting default shared_buffers ... 32MB
> creating configuration files ... ok
> creating template1 database in data2/base/1 ... ok
> initializing pg_authid ... FATAL: database locale is incompatible with operatin
> g system
> DETAIL: The database was initialized with LC_COLLATE "Chinese (Simplified)_Peoples Republic of China.936", which is not recognized by setlocale().
> HINT: Recreate the database with another locale or install the missing locale.
> child process exited with exit code 1

The problem is probably the apostrophe in the locale name, although it
seems to be missing from the above error message. setlocale() has a
known problem with locale names that have dots in the country name, and
looks like it has similar issues with apostrophes.

Fortunately, there are aliases for those problematic locales on Windows,
that don't have dots or apostrophes in the names. We did some testing in
EnterpriseDB of various locales on various versions of Windows, and came
up with the following mappings:

"*_Hong Kong S.A.R.*" -> "*_HKG.*"
"*_U.A.E.*" -> "*_ARE.*"
"*_People's Republic of China.*" -> "*_China.*"
"China_Macau S.A.R..950" -> "ZHM"

The first three mappings map the full country name to an abbreviation
that is also accepted by Windows' setlocale(). See
http://msdn.microsoft.com/en-us/library/cdax410z%28v=vs.71%29.aspx. ARE
is not on that list, but seems to work.

Macau is trickier. ZHM is not an abbreviation of the country, but of the
whole locale, so we can't replace just the country part. So this will
not work for "Finnish_Macau S.A.R..950", like the other mappings do.
Nevertheless, it works for the common case.

Any objections to the 2nd attached patch, which adds the mapping of
those locale names on Windows?

I'm thinking it's not too late to do this in 9.1.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
initdb-fallback-to-utf8-on-windows.patch text/x-diff 1.1 KB
initdb-map-broken-windows-locales.patch text/x-diff 4.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2011-03-21 19:33:50 Re: 2nd Level Buffer Cache
Previous Message Greg Stark 2011-03-21 19:23:56 Re: Planner regression in 9.1: min(x) cannot use partial index with NOT NULL