From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | cees(dot)van(dot)zeeland(at)freedom(dot)nl, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #18362: unaccent rules and Old Greek text |
Date: | 2024-02-25 23:25:36 |
Message-ID: | ZdvMcEkMYoMqELiG@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Mon, Feb 26, 2024 at 12:15:57PM +1300, Thomas Munro wrote:
> The Python script is looking for combining sequences that add accents,
> but this one has just "03AC" in the combining sequence field, so it's
> a kind of "simple" redirection that points here:
>
> 03AC;GREEK SMALL LETTER ALPHA WITH TONOS;Ll;0;L;03B1 0301;;;;N;GREEK
> SMALL LETTER ALPHA TONOS;;0386;;0386
>
> That has a normal looking sequence that we can understand (α + an
> accent). If I tell the script to follow such "simple" redirections, I
> get over a thousand new mappings, including those. See attached.
> There is probably more correct terminology that I'm using here...
Ah, you've beaten me to it. Yes, that's pretty much the impression I
was getting while looking at the set of characters in Unicode.txt. I
am not entirely sure if what you are doing is the best way to do it,
but the set of characters generated in unaccent.rules makes sense
here. I am surprised to see that many, TBH.
Perhaps you should add a few characters of these series to
unaccent.sql?
--
Michael
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2024-02-25 23:59:37 | Re: BUG #18362: unaccent rules and Old Greek text |
Previous Message | Michael Paquier | 2024-02-25 23:19:53 | Re: BUG #18362: unaccent rules and Old Greek text |