Re: BUG #15347: Unaccent for greek characters does not work

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tasos Maschalidis <TaS(dot)O(dot)S(at)hotmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15347: Unaccent for greek characters does not work
Date: 2018-08-28 03:20:40
Message-ID: 20180828032040.GE29157@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, Aug 28, 2018 at 10:50:38AM +1200, Thomas Munro wrote:
> Fair criticism, here's a version with comments.

Thanks, that's way better in my opinion. In the range of fancy things,
I have discovered today the python module unicodedata which can replace
for example 0x03b1 with ord("\N{GREEK SMALL LETTER ALPHA}"), leading to
perhaps more readable code.

Jokes apart, I would have preferred if you used directly the unicode
points as those are easier to look after in UnicodeData.txt, say
'\u03B1' for small alpha. If you want to go with the hex code, it would
be a better reference to copy/paste directly the character name from
UnicodeData.txt as those are easier to search in the future, perhaps
with their unicode points:
- GREEK SMALL LETTER ALPHA
- GREEK SMALL LETTER OMEGA
- GREEK CAPITAL LETTER ALPHA
- GREEK CAPITAL LETTER OMEGA

Running generate_unaccent_rules.py, I get the same result for
unaccent.rules as you do.
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2018-08-28 05:08:33 Re: BUG #15346: Replica fails to start after the crash
Previous Message Michael Paquier 2018-08-28 02:44:09 Re: BUG #15346: Replica fails to start after the crash