Re: BUG #15347: Unaccent for greek characters does not work

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Tasos Maschalidis <tas(dot)o(dot)s(at)hotmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15347: Unaccent for greek characters does not work
Date: 2018-08-23 23:22:41
Message-ID: CAEepm=0F3pv9A3_pe=jQMCS9b-iUPjEQjzoftNJjN8FHwXHeKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Aug 24, 2018 at 10:47 AM, Tasos Maschalidis <tas(dot)o(dot)s(at)hotmail(dot)com> wrote:
> The results are legit for all vowels.

Cool.

> There is only one thing missing which
> I guess does fall into unaccent functionality. When an "σ" is used as the
> last letter of any word, it changes to "s" grammatically, unless the whole
> word is capitals, then it stays the same ("Σ"), even at the end of the word.
> In searches it s useful to convert any "ς" to "σ". I had included it to a
> custom unaccent.rules file I was using and brought desired results. For
> example searching for "Θωμάς" would not match "ΘΩΜΑΣ", unless such a
> convertion exists. Not sure if that should be taken care of somewhere else,
> but in my case (and also in the gist I sent you, check the last comments) it
> proved useful and made sense.

Hmm, I see. Also described here:

https://en.wikipedia.org/wiki/Sigma

I take it you are making searches case insensitive by converting
everything to lower case. Since you have a distinction that exists in
lower case but not in upper case, wouldn't it make more sense to
converting everything to upper case?

postgres=# select upper('Θωμάς'), upper('Θωμάσ'), upper('Θωμάσ') =
upper('Θωμάς');
upper | upper | ?column?
-------+-------+----------
ΘΩΜΆΣ | ΘΩΜΆΣ | t
(1 row)

PS On PostgreSQL mailing lists, we try to avoid "top posting" (=
leaving the message we're replying to below our reply), because it
makes the archive of email threads harder to read.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2018-08-24 00:12:39 Re: BUG #15347: Unaccent for greek characters does not work
Previous Message Tasos Maschalidis 2018-08-23 22:47:59 Re: BUG #15347: Unaccent for greek characters does not work