Quick Links

Re: Does UCS_BASIC have the right CTYPE?

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-hackers(at)postgresql(dot)org
Cc:	Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Vik Fearing <vik(at)2ndquadrant(dot)fr>
Subject:	Re: Does UCS_BASIC have the right CTYPE?
Date:	2023-10-26 18:42:27
Message-ID:	70b79878856d4f2cabe67fb8e3420a92ea641214.camel@j-davis.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, 2023-10-26 at 09:21 -0700, Jeff Davis wrote:
> Our initcap() is not defined in the standard, and we document that it
> only differentiates between alphanumeric and non-alphanumeric
> characters, so we could get that behavior pretty easily as well. If
> we
> wanted to do it the Unicode way instead, we can follow the
> toTitlecase() part of the Default Case Algorithm, which is based on
> word breaks and would require another lookup table for that.

Correction: the rules for word breaks are fairly complex, so it would
not be worth it to try to replicate that just to support initcap(). We
could just use the simple, existing, and documented rules for initcap()
which only differentiate between alphanumeric and not. Anyone who wants
the more sophisticated rules can just use an ICU collation with
initcap().

The point stands that it would be pretty simple to have a collation
that handles upper() and lower() in a standards-compliant way without
relying on libc or ICU. Unfortunately it's too late to call that
collation UCS_BASIC, but it would still be useful.

Regards,
Jeff Davis

In response to

Re: Does UCS_BASIC have the right CTYPE? at 2023-10-26 16:21:40 from Jeff Davis

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bruce Momjian	2023-10-26 19:43:29	Re: Partial aggregates pushdown
Previous Message	Andres Freund	2023-10-26 17:41:36	Re: visibility of open cursors in pg_stat_activity