Re: [PATCH] Completed unaccent dictionary with many missing characters

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Przemysław Sztoch <przemyslaw(at)sztoch(dot)pl>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: [PATCH] Completed unaccent dictionary with many missing characters
Date: 2022-06-23 04:39:26
Message-ID: YrPufpsLPpnr8YY5@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 21, 2022 at 03:41:48PM +0200, Przemysław Sztoch wrote:
> Thomas Munro wrote on 21.06.2022 02:53:
>> Oh, we're using CLDR 41, which reminds me: CLDR 36 added SOUND
>> RECORDING COPYRIGHT[1] so we could drop it from special_cases().

Indeed.

>> Hmm, is it possible to get rid of CYRILLIC CAPITAL LETTER IO and
>> CYRILLIC SMALL LETTER IO by adding Cyrillic to PLAIN_LETTER_RANGES?

That's a good point. There are quite a bit of cyrillic characters
missing a conversion, visibly.

>> That'd leave just DEGREE CELSIUS and DEGREE FAHRENHEIT. Not sure how
>> to kill those last two special cases -- they should be directly
>> replaced by their decomposition.
>>
>> [1] https://unicode-org.atlassian.net/browse/CLDR-11383
>
> I patch v3 support for cirilic is added.
> Special character function has been purged.
> Added support for category: So - Other Symbol. This category include
> characters from special_cases().

I think that we'd better split v3 into more patches to keep each
improvement isolated. The addition of cyrillic characters in the
range of letters and the removal of the sound copyright from the
special cases can be done on their own, before considering the
original case tackled by this thread.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2022-06-23 04:46:46 Re: Add header support to text format and matching feature
Previous Message Michael Paquier 2022-06-23 04:26:29 Re: Add header support to text format and matching feature