From: | Cees van Zeeland <cees(dot)van(dot)zeeland(at)freedom(dot)nl> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #18362: unaccent rules and Old Greek text |
Date: | 2024-02-26 12:33:28 |
Message-ID: | d9875ca6-6438-4c52-adc1-5bc2ed28c362@freedom.nl |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> If I tell the script to follow such "simple" redirections, I
> get over a thousand new mappings, including those. See attached.
> There is probably more correct terminology that I'm using here...
Michael Paquier wrote:
> It seems to me that it is a bit more complicated than that, because
> Unicode.data decomposes the characters with Oxia as characters with
> Tonos, and then characters with Tonos are decomposed with the "base"
> alphabet characters + Tonos. We do a recursive lookup at the unicode
> table in get_plain_letter() and is_letter_with_marks(), so it seems to
> me that we're not missing much, and I suspect that there should be no
> need for a new custom range of characters..
>
> Cees, perhaps you would like to get a shot at that?
>
> [1]: https://en.wikipedia.org/wiki/Greek_diacritics#Unicode
I'm not an expert, but obviously computers make a difference between the
two versions of the characters.
We are talking about this series:
U+1F70 - U+1F7D: ὰ ά ὲ έ ὴ ή ὶ ί ὸ ό
ὺ ύ ὼ ώ
Is it possible to filter / limit in some way the redirection in the
script to this range?
~
Cees
From | Date | Subject | |
---|---|---|---|
Next Message | PG Bug reporting form | 2024-02-26 13:11:33 | BUG #18365: Inconsistent cost function between materialized and non-materialized CTE |
Previous Message | Andrei Lepikhov | 2024-02-26 12:29:12 | Re: "type with xxxx does not exist" when doing ExecMemoize() |