Re: ICU locale validation / canonicalization

From: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ICU locale validation / canonicalization
Date: 2023-02-20 14:46:23
Message-ID: 6f32de82-88e2-480a-8421-07add8f5dee5@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10.02.23 18:53, Jeff Davis wrote:
> To represent ICU locale strings in the catalog consistently, we have
> two choices, which as far as I can tell are equivalent:
>
> 1. ICU format Locale IDs. These are more readable, and still specified
> (albeit non-standard).
>
> 2. BCP47 language tags. These are standardized, there's better
> validation with "strict" mode, and we are already using them.
>
> Honestly I don't think it's hugely important which one we pick. But
> being consistent is important, so we need to pick one, and BCP 47 seems
> like the better option to me.

I found some discussion about this from when ICU support was first
added. See this message as a starting point:
https://www.postgresql.org/message-id/flat/5291804b-169e-3ba9-fdaf-fae8e7d2d959%402ndquadrant.com#96acb7eb9299c2ca64dbabcf58e11a90

There isn't much detail there, but the discussion and the current code
seem pretty convinced that

a) BCP47 tags are preferred, and
b) They don't work with ICU versions before 54.

I can't locate the source for claim b) anymore. However, it seems
pretty clear that there is some cutoff, even if it isn't exactly 54.

I would support transitioning this forward somehow, but we would need to
know exactly what the impact would be.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2023-02-20 14:55:45 Re: Move defaults toward ICU in 16?
Previous Message Bharath Rupireddy 2023-02-20 14:45:00 Re: Add WAL read stats to pg_stat_wal