Windows default locale vs initdb

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Windows default locale vs initdb
Date: 2021-04-19 05:42:51
Message-ID: CA+hUKGJ=XThErgAQRoqfCy1bKPxXVuF0=2zDbB+SxDs59pv7Fw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Moving this topic into its own thread from the one about collation
versions, because it concerns pre-existing problems, and that thread
is long.

Currently initdb sets up template databases with old-style Windows
locale names reported by the OS, and they seem to have caused us quite
a few problems over the years:

db29620d "Work around Windows locale name with non-ASCII character."
aa1d2fc5 "Another attempt at fixing Windows Norwegian locale."
db477b69 "Deal with yet another issue related to "Norwegian (Bokmål)"..."
9f12a3b9 "Tolerate version lookup failure for old style Windows locale..."

... and probably more, and also various threads about , for example,
"German_German.1252" vs "German_Switzerland.1252" which seem to get
confused or badly canonicalised or rejected somewhere in the mix.

I hadn't focused on any of that before, being a non-Windows-user, but
the entire contents of win32setlocale.c supports the theory that
Windows' manual meant what it said when it said[1]:

"We do not recommend this form for locale strings embedded in
code or serialized to storage, because these strings are more likely
to be changed by an operating system update than the locale name
form."

I suppose that was the only form available at the time the code was
written, so there was no choice. The question we asked ourselves
multiple times in the other thread was how we're supposed to get to
the modern BCP 47 form when creating the template databases. It looks
like one possibility, since Vista, is to call
GetUserDefaultLocaleName()[2], which doesn't appear to have been
discussed before on this list. That doesn't allow you to ask for the
default for each individual category, but I don't know if that is even
a concept for Windows user settings. It may be that some of the other
nearby functions give a better answer for some reason. But one thing
is clear from a test that someone kindly ran for me: it reports
standardised strings like "en-NZ", not strings like "English_New
Zealand.1252".

No patch, but I wondered if any Windows hackers have any feedback on
relative sanity of trying to fix all these problems this way.

[1] https://docs.microsoft.com/en-us/cpp/c-runtime-library/locale-names-languages-and-country-region-strings?view=msvc-160
[2] https://docs.microsoft.com/en-us/windows/win32/api/winnls/nf-winnls-getuserdefaultlocalename

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-04-19 06:08:41 Re: Table refer leak in logical replication
Previous Message Bharath Rupireddy 2021-04-19 05:35:22 Re: Remove redundant variable from transformCreateStmt