Re: BUG #15548: Unaccent does not remove combining diacritical characters

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: hugh(at)whtc(dot)ca
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, daniel(at)manitou-mail(dot)org, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15548: Unaccent does not remove combining diacritical characters
Date: 2018-12-18 04:10:25
Message-ID: CAEepm=1vRrNyam3ietQQ6ZdJ5JktkUphCEB0=_mPAKz8mjBB-A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Tue, Dec 18, 2018 at 3:05 PM Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Tue, Dec 18, 2018 at 12:03 PM Hugh Ranalli <hugh(at)whtc(dot)ca> wrote:
> +ʹ '
> +ʺ "
> +ʻ '
> +ʼ '
> +ʽ '
> +˂ <
> +˃ >
> +˄ ^
> +ˆ ^
> +ˈ '
> +ˋ `
> +ː :
> +˖ +
> +˗ -
> +˜ ~
>
> I don't think this is quite right. Those don't seem to be the
> combining codepoints[1], and in any case they are being replaced with
> ASCII characters, whereas I thought we wanted to replace them with
> nothing at all. Here is my attempt to come up with a test case using
> combining characters:
>
> select unaccent('un café crème s''il vous plaît');

Oh, I see now that that was just the v34 ASCII transliteration update,
and perhaps the diacritic stripping will be posted separately.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2018-12-18 04:57:08 Re: BUG #15548: Unaccent does not remove combining diacritical characters
Previous Message Thomas Munro 2018-12-18 04:05:00 Re: BUG #15548: Unaccent does not remove combining diacritical characters

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2018-12-18 04:52:52 Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query
Previous Message Thomas Munro 2018-12-18 04:05:00 Re: BUG #15548: Unaccent does not remove combining diacritical characters