Re: ICU locale validation / canonicalization

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ICU locale validation / canonicalization
Date: 2023-02-09 15:53:38
Message-ID: CA+Tgmob6mDjCCA-BL2oFOpdRdijkkVrQ09DuMBi0rV_2omPVmA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 8, 2023 at 2:59 AM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> We do check that the value is accepted by ICU, but ICU seems to accept
> anything and use some fallback logic. Bogus strings will typically end
> up as the "root" locale (spelled "root" or "").

I've noticed this, and I think it's really frustrating. There's barely
any documentation of what strings you're allowed to specify, and the
documentation that does exist is extremely difficult to understand.
Normally, you could work around that problem to some degree by making
a guess at what you're supposed to be doing and then seeing whether
the program accepts it, but here that doesn't work either. It just
accepts anything you give it and then you have to try to figure out
whether the behavior is what you wanted. But there's also no real
documentation of what the behavior of any collation is, so you're
apparently just supposed to magically know what collations exist and
how they behave and then you can test whether the string you put in
gave you the behavior you wanted.

Adding validation and canonicalization wouldn't cure the documentation
problems, but it would be a big help. You still wouldn't know what
string you were supposed to be passing to ICU, but if you did pass it
a string, you'd find out what it thought that string meant. I think
that would be a huge step forward.

Unfortunately, I have no idea whether your specific ideas about how to
make that happen are any good or not. But I hope they are, because the
current situation is pessimal.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-02-09 15:55:56 Re: Inconsistent nullingrels due to oversight in deconstruct_distribute_oj_quals
Previous Message Tomas Vondra 2023-02-09 15:53:08 Re: AW: Wrong rows estimations with joins of CTEs slows queries by more than factor 500