Re: Built-in CTYPE provider

From: Peter Eisentraut <peter(at)eisentraut(dot)org>
To: Jeff Davis <pgsql(at)j-davis(dot)com>, Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Built-in CTYPE provider
Date: 2024-02-13 06:24:32
Message-ID: 27bb0e52-801d-4f73-a0a4-02cfdd4a9ada@eisentraut.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 13.02.24 03:01, Jeff Davis wrote:
> 1. The SQL spec mentions the capitalization of "ß" as "SS"
> specifically. Should UCS_BASIC use the unconditional mappings in
> SpecialCasing.txt? I already have some code to do that (not posted
> yet).

It is my understanding that "correct" Unicode case conversion needs to
use at least some parts of SpecialCasing.txt. The header of the file says

"For compatibility, the UnicodeData.txt file only contains simple case
mappings for characters where they are one-to-one and independent of
context and language. The data in this file, combined with the simple
case mappings in UnicodeData.txt, defines the full case mappings [...]"

I read this as, just using UnicodeData.txt by itself is incomplete.

I think we need to use the "Unconditional" mappings and the "Conditional
Language-Insensitive" mappings (which is just Greek sigma). Obviously,
skip the "Language-Sensitive" mappings.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-02-13 06:50:30 Re: Synchronizing slots from primary to standby
Previous Message Bharath Rupireddy 2024-02-13 06:17:06 Do away with zero-padding assumption before WALRead()