Re: BUG #15548: Unaccent does not remove combining diacritical characters

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, hugh(at)whtc(dot)ca, daniel(at)manitou-mail(dot)org, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15548: Unaccent does not remove combining diacritical characters
Date: 2018-12-18 06:07:35
Message-ID: 20181218060735.GL1532@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Tue, Dec 18, 2018 at 12:36:02AM -0500, Tom Lane wrote:
> tl;dr: I think we should convert unaccent.sql and unaccent.out
> to UTF8 encoding. Then, adding more test cases for this patch
> will be easy.

Do you think that we could also remove the non-ASCII characters from the
tests? It would be easy enough to use E'\xNN' (utf8 hex) or such in
input, and show the output with bytea. That's harder to read, still we
discussed about not using UTF-8 in the python script to allow folks with
simple terminals to touch the code the last time this was touched
(5e8d670) and the characters used could be documented as comments in the
tests.
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Amit Langote 2018-12-18 06:12:53 Re: BUG #15552: Unexpected error in COPY to a foreign table in a transaction
Previous Message Michael Paquier 2018-12-18 06:02:43 Re: BUG #15552: Unexpected error in COPY to a foreign table in a transaction

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-12-18 06:23:57 Re: BUG #15548: Unaccent does not remove combining diacritical characters
Previous Message Kyotaro HORIGUCHI 2018-12-18 05:56:00 Re: don't create storage when unnecessary