Re: Collation versioning

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Robert Haas <robertmhaas(at)gmail(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Douglas Doole <dougdoole(at)gmail(dot)com>, Christoph Berg <myon(at)debian(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Collation versioning
Date: 2020-11-02 23:29:54
Message-ID: CAApHDvpfOzs-tfm8wfiYg1zUZS0PDzNJG=rqnPHTfvks+G9ivg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 3 Nov 2020 at 09:43, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> Fortunately David Rowley is able to repro this on his Windows box (it
> fails even with strings that are succeeding on the other BF machines),
> so we have something to work with. The name mangling that is done in
> get_iso_localename() looks pretty interesting... It does feel a bit
> like there is some other hidden environmental factor or setting here,
> because commit 352f6f2df60 tested OK on Juan Jose's machine too.
> Hopefully more soon.

It seems to boil down to GetNLSVersionEx() not liking the "English_New
Zealand.1252" string. The theory about it having a space does not
seem to be a factor as if I change it to "English_Australia.1252", I
get the same issue.

Going by the docs in [1] and following the "local name" link to [2],
there's a description there that mentions: "Generally, the pattern
<language>-<REGION> is used.". So, if I just hack the code in
get_collation_actual_version() to pass "en-NZ" to GetNLSVersionEx(),
that works fine.

In [3], Juan José was passing in en-US rather than these more weird
Windows-specific locale strings, so the testing that code got when it
went in didn't include seeing if something like "English_New
Zealand.1252" would be accepted.

The "English_New Zealand.1252" string seems to come from the
setlocales() call in initdb via check_locale_name(LC_COLLATE,
lc_collate, &canonname), and fundamentally setlocale(LC_COLLATE).

I'm still a bit mystified why whelk seems unphased by this change. You
can see from [4] that it must be passing "German_Germany.1252" to
GetNLSVersionEx(). I've tested both on Windows 8.1 and Windows 10 and
I can't get GetNLSVersionEx() to accept that. So maybe Windows 7
allowed these non-ISO formats? That theory seems to break down a bit
when you see that walleye is perfectly happy on Windows 10 (MinGW64).
You can see from [5] it mentions "The database cluster will be
initialized with locale "English_United States.1252".".

Running low on ideas for now, so thought I'd post this in case it
someone thinks of something else.

David

[1] https://docs.microsoft.com/en-us/windows/win32/api/winnls/nf-winnls-getnlsversionex
[2] https://docs.microsoft.com/en-us/windows/win32/intl/locale-names
[3] https://www.postgresql.org/message-id/CAC+AXB0Eat3aLeTrbDoBB9jX863CU_+RSbgiAjcED5DcXoBoFQ@mail.gmail.com
[4] https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=whelk&dt=2020-11-02%2020%3A41%3A40&stg=check-pg_upgrade
[5] https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=walleye&dt=2020-11-02%2020%3A55%3A31&stg=check-pg_upgrade

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2020-11-03 00:51:25 Re: Collation versioning
Previous Message Tomas Vondra 2020-11-02 23:25:45 Re: WIP: BRIN multi-range indexes