Re: BUG #15548: Unaccent does not remove combining diacritical characters

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: hugh(at)whtc(dot)ca, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, daniel(at)manitou-mail(dot)org, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15548: Unaccent does not remove combining diacritical characters
Date: 2018-12-18 04:57:08
Message-ID: 20181218045708.GI1532@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Tue, Dec 18, 2018 at 03:05:00PM +1100, Thomas Munro wrote:
> I don't think this is quite right. Those don't seem to be the
> combining codepoints[1], and in any case they are being replaced with
> ASCII characters, whereas I thought we wanted to replace them with
> nothing at all. Here is my attempt to come up with a test case using
> combining characters:
>
> select unaccent('un café crème s''il vous plaît');
>
> It's not stripping the accents. I've attached that in a file for
> reference so you can run it with psql -f x.sql, and you can see that
> it's using combining code points (code points 0301, 0300, 0302 which
> come out as cc81, cc80, cc82 in UTF-8) like so:

Could you also add some tests in contrib/unaccent/sql/unaccent.sql at
the same time? That would be nice to check easily the extent of the
patches proposed on this thread.
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2018-12-18 05:04:19 Re: BUG #15552: Unexpected error in COPY to a foreign table in a transaction
Previous Message Thomas Munro 2018-12-18 04:10:25 Re: BUG #15548: Unaccent does not remove combining diacritical characters

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-12-18 05:36:02 Re: BUG #15548: Unaccent does not remove combining diacritical characters
Previous Message Amit Kapila 2018-12-18 04:52:52 Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query