Re: BUG #15347: Unaccent for greek characters does not work

From: Tasos Maschalidis <tas(dot)o(dot)s(at)hotmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15347: Unaccent for greek characters does not work
Date: 2018-08-23 22:47:59
Message-ID: VI1PR01MB38531B89D1413B9C2307594DB5370@VI1PR01MB3853.eurprd01.prod.exchangelabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi Thomas,

The results are legit for all vowels. There is only one thing missing which I guess does fall into unaccent functionality. When an "σ" is used as the last letter of any word, it changes to "s" grammatically, unless the whole word is capitals, then it stays the same ("Σ"), even at the end of the word. In searches it s useful to convert any "ς" to "σ". I had included it to a custom unaccent.rules file I was using and brought desired results. For example searching for "Θωμάς" would not match "ΘΩΜΑΣ", unless such a convertion exists. Not sure if that should be taken care of somewhere else, but in my case (and also in the gist I sent you, check the last comments) it proved useful and made sense.

Thank you,
Tasos Maschalidis
________________________________
From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Sent: Friday, August 24, 2018 1:16:14 AM
To: Tasos Maschalidis
Cc: PostgreSQL mailing lists
Subject: Re: BUG #15347: Unaccent for greek characters does not work

On Fri, Aug 24, 2018 at 12:22 AM, Tasos Maschalidis <TaS(dot)O(dot)S(at)hotmail(dot)com> wrote:
> return (codepoint.id >= ord('a') and codepoint.id <= ord('z')) or \
> (codepoint.id >= ord('A') and codepoint.id <= ord('Z')) or \
>
> (codepoint.id >= ord('α') and codepoint.id <= ord('ω')) or \
> (codepoint.id >= ord('Α') and codepoint.id <= ord('Ω'))

Thank you. Here it is in the form of a patch that I propose to commit
to PostgreSQL 12. It adds 221 lines to unaccent.rules. They look
sane to my untrained eye. Do you agree?

Example of use:

postgres=# select unaccent('Θέμα: Re: BUG #15347: Unaccent for greek ...');
unaccent
----------------------------------------------
Θεμα: Re: BUG #15347: Unaccent for greek ...
(1 row)

I wondered if the documentation might need a change, but it already
says something broad enough: "A more complete example, which is
directly useful for most European languages, can be found in
unaccent.rules, ...".

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Thomas Munro 2018-08-23 23:22:41 Re: BUG #15347: Unaccent for greek characters does not work
Previous Message Thomas Munro 2018-08-23 22:16:14 Re: BUG #15347: Unaccent for greek characters does not work