Re: BUG #18362: unaccent rules and Old Greek text

From: Cees van Zeeland <cees(dot)van(dot)zeeland(at)freedom(dot)nl>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18362: unaccent rules and Old Greek text
Date: 2024-03-01 15:54:07
Message-ID: 63c65b3a-d142-409d-92ec-2a7d1df6f697@freedom.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi Thomas,

I found:
https://www.unicode.org/Public/15.1.0/ucd/CompositionExclusions.txt
that might be useful to tackle characters that we are searching for.

Hope this helps.

Cees

On 01/03/2024 02:53, Thomas Munro wrote:
> On Tue, Feb 27, 2024 at 1:33 AM Cees van Zeeland
> <cees(dot)van(dot)zeeland(at)freedom(dot)nl> wrote:
>> I'm not an expert, but obviously computers make a difference between the two versions of the characters.
>> We are talking about this series:
>> U+1F70 - U+1F7D: ὰ ά ὲ έ ὴ ή ὶ ί ὸ ό ὺ ύ ὼ ώ
>> Is it possible to filter / limit in some way the redirection in the script to this range?
> Right, so to get this in we either need to decide that we're OK with
> adding that many characters, or figure out some systematic way to
> select just the ones we want. One hint that might be helpful if
> someone wants to investigate: I suspect that a lot of those mappings
> might be marked with <font>, which seems to be for code points for
> alternative renderings ("mathematical" bold, italic, fraktur etc), so
> perhaps we could filter them out that way without losing the
> oxia-marked characters if that's the way it has to be.
>
> I think all the relevant part of the character database file is described here:
>
> https://unicode.org/reports/tr44/#Property_Values
>
> The file we're currently using is 15.1:
>
> https://www.unicode.org/Public/15.1.0/ucd/UnicodeData.txt
>
> I registered this thread as https://commitfest.postgresql.org/47/4873/ .

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Alexey Ermakov 2024-03-01 15:54:36 Re: BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker
Previous Message Andrei Lepikhov 2024-03-01 12:48:14 Re: BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker