Re: BUG #18362: unaccent rules and Old Greek text

From: Cees van Zeeland <cees(dot)van(dot)zeeland(at)freedom(dot)nl>
To: Michael Paquier <michael(at)paquier(dot)xyz>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18362: unaccent rules and Old Greek text
Date: 2024-02-26 12:33:28
Message-ID: d9875ca6-6438-4c52-adc1-5bc2ed28c362@freedom.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>    If I tell the script to follow such "simple" redirections, I
>   get over a thousand new mappings, including those.  See attached.
>   There is probably more correct terminology that I'm using here...

Michael Paquier wrote:
> It seems to me that it is a bit more complicated than that, because
> Unicode.data decomposes the characters with Oxia as characters with
> Tonos, and then characters with Tonos are decomposed with the "base"
> alphabet characters + Tonos.  We do a recursive lookup at the unicode
> table in get_plain_letter() and is_letter_with_marks(), so it seems to
> me that we're not missing much, and I suspect that there should be no
> need for a new custom range of characters..
>
> Cees, perhaps you would like to get a shot at that?
>
> [1]: https://en.wikipedia.org/wiki/Greek_diacritics#Unicode

I'm not an expert, but obviously computers make a difference between the
two versions of the characters.
We are talking about this series:
U+1F70 - U+1F7D:    ὰ     ά     ὲ     έ     ὴ     ή     ὶ     ί ὸ     ό
    ὺ     ύ     ὼ     ώ
Is it possible to filter / limit in some way the redirection in the
script to this range?

~
Cees

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2024-02-26 13:11:33 BUG #18365: Inconsistent cost function between materialized and non-materialized CTE
Previous Message Andrei Lepikhov 2024-02-26 12:29:12 Re: "type with xxxx does not exist" when doing ExecMemoize()