Re: BUG #15347: Unaccent for greek characters does not work

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Tasos Maschalidis <TaS(dot)O(dot)S(at)hotmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15347: Unaccent for greek characters does not work
Date: 2018-08-23 22:16:14
Message-ID: CAEepm=0RUhOuvQs2LQnFYzR4GWHtn6wUT9UaKi+vC0erKW4=dw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Aug 24, 2018 at 12:22 AM, Tasos Maschalidis <TaS(dot)O(dot)S(at)hotmail(dot)com> wrote:
> return (codepoint.id >= ord('a') and codepoint.id <= ord('z')) or \
> (codepoint.id >= ord('A') and codepoint.id <= ord('Z')) or \
>
> (codepoint.id >= ord('α') and codepoint.id <= ord('ω')) or \
> (codepoint.id >= ord('Α') and codepoint.id <= ord('Ω'))

Thank you. Here it is in the form of a patch that I propose to commit
to PostgreSQL 12. It adds 221 lines to unaccent.rules. They look
sane to my untrained eye. Do you agree?

Example of use:

postgres=# select unaccent('Θέμα: Re: BUG #15347: Unaccent for greek ...');
unaccent
----------------------------------------------
Θεμα: Re: BUG #15347: Unaccent for greek ...
(1 row)

I wondered if the documentation might need a change, but it already
says something broad enough: "A more complete example, which is
directly useful for most European languages, can be found in
unaccent.rules, ...".

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
0001-Add-Greek-characters-to-unaccent.rules.patch application/octet-stream 4.1 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tasos Maschalidis 2018-08-23 22:47:59 Re: BUG #15347: Unaccent for greek characters does not work
Previous Message Tom Lane 2018-08-23 15:24:55 Re: BUG #15342: pg_dump - XML with mixed content types generates invalid backup file