Re: Windows default locale vs initdb

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Juan José Santamaría Flecha <juanjo(dot)santamaria(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Windows default locale vs initdb
Date: 2022-07-20 11:44:04
Message-ID: CA+hUKGJZskvCh=Qm75UkHrY6c1QZUuC92Po9rponj1BbLmcMEA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 20, 2022 at 10:27 PM Juan José Santamaría Flecha
<juanjo(dot)santamaria(at)gmail(dot)com> wrote:
> On Tue, Jul 19, 2022 at 4:47 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>> As for whether "accordingly" still applies, by the logic of of
>> win32_langinfo()... Windows still considers WIN1252 to be the default
>> ANSI code page for "en-US", though it'd work with UTF-8 too. I'm not
>> sure what to make of that. The goal here was to give Windows users
>> good defaults, but WIN1252 is probably not what most people actually
>> want. Hmph.
>
>
> Still, WIN1252 is not the wrong answer for what we are asking. Even if you enable UTF-8 support [1], the system will use the current default Windows ANSI code page (ACP) for the locale and UTF-8 for the code page.

I'm still confused about what that means. Suppose we decided to
insist by adding a ".UTF-8" suffix to the name, as that page says we
can now that we're on Windows 10+, when building the default locale
name (see experimental 0002 patch, attached). It initially seemed to
have the right effect:

The database cluster will be initialized with locale "en-US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

But then the Turkish i test in contrib/citext/sql/citext_utf8.sql failed[1]:

SELECT 'i'::citext = 'İ'::citext AS t;
t
---
- t
+ f
(1 row)

About the pg_upgrade problem, maybe it's OK ... existing old format
names should continue to work, but we can still remove the weird code
that does locale name tweaking, right? pg_upgraded databases should
contain fixed names (ie that were fixed by old initdb so should
continue to work), and new clusters will get BCP 47 names.

I don't really know, I was just playing with rough ideas by sending
patches to CI here...

[1] https://cirrus-ci.com/task/6423238052937728

Attachment Content-Type Size
v3-0001-Default-to-BCP-47-locale-in-initdb-on-Windows.patch text/x-patch 3.8 KB
v3-0002-Default-to-UTF-8-in-initdb-on-Windows.patch text/x-patch 2.0 KB
v3-0003-Remove-support-for-old-Windows-locale-names.patch text/x-patch 19.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2022-07-20 11:55:33 Re: Use "WAL segment" instead of "log segment" consistently in user-facing messages
Previous Message Bharath Rupireddy 2022-07-20 11:39:09 Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?