Re: Windows default locale vs initdb

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Juan José Santamaría Flecha <juanjo(dot)santamaria(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Windows default locale vs initdb
Date: 2022-07-18 22:58:41
Message-ID: CA+hUKGKrGP3BhS+h1zdLYRRvhtbyaz4bhjUyPktAQfH=uS2JXw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 15, 2021 at 11:32 PM Juan José Santamaría Flecha
<juanjo(dot)santamaria(at)gmail(dot)com> wrote:
> On Sun, May 16, 2021 at 6:29 AM Noah Misch <noah(at)leadboat(dot)com> wrote:
>> On Mon, Apr 19, 2021 at 05:42:51PM +1200, Thomas Munro wrote:
>> > The question we asked ourselves
>> > multiple times in the other thread was how we're supposed to get to
>> > the modern BCP 47 form when creating the template databases. It looks
>> > like one possibility, since Vista, is to call
>> > GetUserDefaultLocaleName()[2]
>>
>> > No patch, but I wondered if any Windows hackers have any feedback on
>> > relative sanity of trying to fix all these problems this way.
>>
>> Sounds reasonable. If PostgreSQL v15 would otherwise run on Windows Server
>> 2003 R2, this is a good time to let that support end.
>>
> The value returned by GetUserDefaultLocaleName() is a system configured parameter, independent of what you set with setlocale(). It might be reasonable for initdb but not for a backend in most cases.

Agreed. Only for initdb, and only if you didn't specify a locale name
on the command line.

> You can get the locale POSIX-ish name using GetLocaleInfoEx(), but this is no longer recommended, because using LCIDs is no longer recommended [1]. Although, this would work for legacy locales. Please find attached a POC patch showing this approach.

Now that museum-grade Windows has been defenestrated, we are free to
call GetUserDefaultLocaleName(). Here's a patch.

One thing you did in your patch that I disagree with, I think, was to
convert a BCP 47 name to a POSIX name early, that is, s/-/_/. I think
we should use the locale name exactly as Windows (really, under the
covers, ICU) spells it. There is only one place in the tree today
that really wants a POSIX locale name, and that's LC_MESSAGES,
accessed by GNU gettext, not Windows. We already had code to cope
with that.

I think we should also convert to POSIX format when making the
collname in your pg_import_system_collations() proposal, so that
COLLATE "en_US" works (= a SQL identifier), but that's another
thread[1]. I don't think we should do it in collcollate or
datcollate, which is a string for the OS to interpret.

With my garbage collector hat on, I would like to rip out all of the
support for traditional locale names, eventually. Deleting kludgy
code is easy and fun -- 0002 is a first swing at that -- but there
remains an important unanswered question. How should someone
pg_upgrade a "English_Canada.1521" cluster if we now reject that name?
We'd need to do a conversion to "en-CA", or somehow tell the user to.
Hmmmm.

[1] https://www.postgresql.org/message-id/flat/CAC%2BAXB0WFjJGL1n33bRv8wsnV-3PZD0A7kkjJ2KjPH0dOWqQdg%40mail.gmail.com

Attachment Content-Type Size
0001-Default-to-BCP-47-locale-in-initdb-on-Windows.patch text/x-patch 3.5 KB
0002-Remove-support-for-old-Windows-locale-names.patch text/x-patch 19.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2022-07-18 23:01:08 Re: Add WAL recovery messages with log_wal_traffic GUC (was: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display)
Previous Message Jacob Champion 2022-07-18 22:53:25 Re: Commitfest Update