Re: Built-in CTYPE provider

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Built-in CTYPE provider
Date: 2024-01-12 02:02:30
Message-ID: 12e4f6a78403b33c303c20e44976f891d879be09.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2024-01-10 at 23:56 +0100, Daniel Verite wrote:
> A related comment is about naming the builtin locale C.UTF-8, the
> same
> name as in libc. On one hand this is semantically sound, but on the
> other hand, it's likely to confuse people. What about using
> completely
> different names, like "pg_unicode" or something else prefixed by
> "pg_"
> both for the locale name and the collation name (currently
> C.UTF-8/c_utf8)?

New version attached. Changes:

* Named collation object PG_C_UTF8, which seems like a good idea to
prevent name conflicts with existing collations. The locale name is
still C.UTF-8, which still makes sense to me because it matches the
behavior of the libc locale of the same name so closely.

* Added missing documentation for initdb --builtin-locale

* Refactored the upper/lower/initcap implementations

* Improved tests for case conversions where the byte length of the
UTF8-encoded string changes (the string length doesn't change because
we don't do full case mapping).

* No longer uses titlecase mappings -- libc doesn't do that, so it was
an unnecessary difference in case mapping behavior.

* Improved test report per Jeremy's suggestion: now it reports the
number of codepoints tested.

Jeremy also raised a problem with old versions of psql connecting to a
new server: the \l and \dO won't work. Not sure exactly what to do
there, but I could work around it by adding a new field rather than
renaming (though that's not ideal).

Regards,
Jeff Davis

Attachment Content-Type Size
v16-0002-Add-Unicode-property-tables.patch text/x-patch 91.5 KB
v16-0003-Add-unicode-case-mapping-tables-and-functions.patch text/x-patch 145.0 KB
v16-0004-Catalog-changes-preparing-for-builtin-collation-.patch text/x-patch 48.3 KB
v16-0005-Introduce-collation-provider-builtin-for-C-and-C.patch text/x-patch 75.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2024-01-12 02:04:59 Re: Emit fewer vacuum records by reaping removable tuples during pruning
Previous Message Bharath Rupireddy 2024-01-12 01:58:26 Re: pgsql: Add support event triggers on authenticated login