Re: Collation versioning

From: Juan José Santamaría Flecha <juanjo(dot)santamaria(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Robert Haas <robertmhaas(at)gmail(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Douglas Doole <dougdoole(at)gmail(dot)com>, Christoph Berg <myon(at)debian(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Collation versioning
Date: 2020-11-04 07:44:15
Message-ID: CAC+AXB2xvqr3w6QPB_THNZ0-ZkG22OXnWALz5-Y1Mn3LYYsgCg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 3, 2020 at 10:49 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:

>
> So we have:
>
> 1. Windows locale names, like "English_United States.1252". Windows
> still returns these from setlocale(), so they finish up in datcollate,
> and yet some relevant APIs don't accept them, at least on some
> machines.
>
> 2. BCP 47/RFC 5646 language tags, like "en-US". Windows uses these
> in relevant new APIs, including the case in point.
>
> 3. Unix-style (XPG? ISO/IEC 15897?) locale names, like "en_US"
> ("language[_territory[(dot)codeset]][(at)modifier]"). These are used for
> message catalogues.
>
> We have a VS2015+ way of converting from form 1 to form 2 (and thence
> 3 by s/-/_/), and an older way. Unfortunately, the new way looks a
> little too fuzzy: if i'm reading it right, search_locale_enum() might
> stop on either "en" or "en-AU", given "English_Australia", depending
> on the search order, no?

No, that is not the case. "English" could match any locale if the
enumeration order was to be changed in the future, right now the order is a
given (Language, Location), but "English_Australia" can only match "en-AU".

This may be fine for the purpose of looking
> up error messages with gettext() (where there is only one English
> language message catalogue, we haven't got around to translating our
> errors into 'strayan yet), but it doesn't seem like a good way to look
> up the collation version; for all I know, "en" variants might change
> independently (I doubt it in practice, but in theory it's wrong). We
> want the same algorithm that Windows uses internally to resolve the
> old style name to a collation; in other words we probably want
> something more like the code path that they took away in VS2015 :-(.
>

We could create a static table with the conversion based on what was
discussed for commit a169155, please find attached a spreadsheet with the
comparison. This would require maintenance as new LCIDs are released [1].

[1]
https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-lcid/70feba9f-294e-491e-b6eb-56532684c37f

Regards,

Juan José Santamaría

Attachment Content-Type Size
WindowsNLSLocales.ods application/vnd.oasis.opendocument.spreadsheet 18.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-11-04 08:10:54 Re: Collation versioning
Previous Message Peter Smith 2020-11-04 07:29:30 Re: [HACKERS] logical decoding of two-phase transactions