Re: Does UCS_BASIC have the right CTYPE?

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-hackers(at)postgresql(dot)org
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Vik Fearing <vik(at)2ndquadrant(dot)fr>
Subject: Re: Does UCS_BASIC have the right CTYPE?
Date: 2023-10-26 16:21:40
Message-ID: d7bc1d3f56a3604451beb7ac91455de26d857b83.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2023-10-26 at 16:49 +0200, Peter Eisentraut wrote:
> On 25.10.23 20:32, Jeff Davis wrote:
> > But what should the result of UPPER('á' COLLATE UCS_BASIC) be? In
> > Postgres, the answer is 'á', but intuitively, one could reasonably
> > expect the answer to be 'Á'.
>
> I think that's right.  But what would you put into ctype to make that
> happen?

It looks like using Unicode files to implement
upper()/lower()/initcap() behavior would not be very hard. The Default
Case Algorithm only needs a simple mapping for toUppercase() and
toLowercase().

Our initcap() is not defined in the standard, and we document that it
only differentiates between alphanumeric and non-alphanumeric
characters, so we could get that behavior pretty easily as well. If we
wanted to do it the Unicode way instead, we can follow the
toTitlecase() part of the Default Case Algorithm, which is based on
word breaks and would require another lookup table for that.

I've already posted a patch that includes a lookup table for the
General Category, so that could be used for the rest of the ctype
behavior.

Doing ctype based on built-in Unicode tables would be a good use case
for the "builtin" provider that I had previously proposed.

> The standard doesn't have the notion of locale-dependent case
> conversion.

That explains it. Interesting.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2023-10-26 16:37:52 Re: CRC32C Parallel Computation Optimization on ARM
Previous Message Robert Haas 2023-10-26 15:47:32 visibility of open cursors in pg_stat_activity