Re: ICU locale validation / canonicalization

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ICU locale validation / canonicalization
Date: 2023-03-23 17:16:45
Message-ID: 4efdd7bdf9c811e1a32b8d10524bae8040367068.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2023-03-23 at 07:27 +0100, Peter Eisentraut wrote:
> So, does uloc_canonicalize() always convert to ICU locale IDs?  What
> if
> you pass a language tag, does it convert it to ICU locale ID as well?

Yes.

The documentation is not clear on that point, but my testing shows that
it does. And this is only for old versions of the code, so we don't
need to worry about later versions of ICU changing that.

I thought about using uloc_forLanguageTag(), but the documentation for
that is not clear what formats it accepts as an input, so it doesn't
seem like a win. If wanted to be paranoid we could use
uloc_toLanguageTag() followed by uloc_forLanguageTag(), but that seemed
excessive.

> >
> 0002 and 0003 look ok to me now.

Thank you, committed 0002 and 0003.

> In 0002, the error "opening default collator is not supported",
> should
> that be an assert or an elog?  Is it reachable by the user?

It's not reachable by the user, but could catch a bug if we
accidentally read a NULL field from the catalog or something like that.
It seemed a worthwhile check to leave in production builds.

> You might want to check the declarations at the top of
> pg_ucol_open().
> 0003 reformats them after they were just added in 0002.  Maybe check
> that they are pgindent'ed in 0002 properly.

They seem to be pgindented fine in 0002, it was unnecessarily
reindented in 0003 and I fixed that.

I use emacs "align-current" and generally that does the right thing,
but I'll rely more on pgindent in the future.

> I don't understand patch 0004.  It seems to do two things, handle
> C/POSIX locale specifications and add an SQL-callable function.  Are
> those connected?

It's hard to test (or even exercise) the former without the latter.

I could get rid of the SQL-callable function and move the rest of the
changes into 0006. I'll see if that arrangement works better, and that
way we can add the SQL-callable function later (or perhaps not at all
if it's not desired).

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2023-03-23 17:24:40 Re: Add pg_walinspect function with block info columns
Previous Message Onur Tirtir 2023-03-23 17:11:32 RE: [EXTERNAL] Re: [PATCH] Report the query string that caused a memory error under Valgrind