Re: Add standard collation UNICODE

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add standard collation UNICODE
Date: 2023-03-08 18:25:42
Message-ID: 77f0df84a8e146bd1afa55bcaf26dcd6cc3faebd.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2023-03-08 at 07:21 +0100, Peter Eisentraut wrote:
> On 04.03.23 19:29, Jeff Davis wrote:
> > It looks like the way you've handled this is by inserting the
> > collation
> > with collprovider=icu even if built without ICU support. I think
> > that's
> > a new case, so we need to make sure it throws reasonable user-
> > facing
> > errors.
>
> It would look like this:
>
> => select * from t1 order by b collate unicode;
> ERROR:  0A000: ICU is not supported in this build

Right, the error looks good. I'm just pointing out that before this
patch, having provider='i' in a build without ICU was a configuration
mistake; whereas afterward every database will have a collation with
provider='i' whether it has ICU support or not. I think that's fine,
I'm just double-checking.

Why is "unicode" only provided for the UTF-8 encoding? For "ucs_basic"
that makes some sense, because the implementation only works in UTF-8.
But here we are using ICU, and the "und" locale should work for any
ICU-supported encoding. I suggest that we use collencoding=-1 for
"unicode", and the docs can just add a note next to "ucs_basic" that it
only works for UTF-8, because that's the weird case.

For the docs, I suggest that you clarify that "ucs_basic" has the same
behavior as the C locale does *in the UTF-8 encoding*. Not all users
might pick up on the subtlety that the C locale has different behaviors
in different encodings.

Other than that, it looks good.

--
Jeff Davis
PostgreSQL Contributor Team - AWS

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2023-03-08 18:44:32 Re: Add shared buffer hits to pg_stat_io
Previous Message Antonin Houska 2023-03-08 18:07:57 Re: Parallelize correlated subqueries that execute within each worker