Re: BUG #15548: Unaccent does not remove combining diacritical characters

From: Hugh Ranalli <hugh(at)whtc(dot)ca>
To: thomas(dot)munro(at)enterprisedb(dot)com
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Verite <daniel(at)manitou-mail(dot)org>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15548: Unaccent does not remove combining diacritical characters
Date: 2018-12-18 13:01:00
Message-ID: CAAhbUMMzPERSe3KfKKQfR4COJCZSrss1G7KRyUraYJyvrVyOUg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Mon, 17 Dec 2018 at 23:05, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
wrote:

> +ʹ '
> +ʺ "
> +ʻ '
> +ʼ '
> +ʽ '
> +˂ <
> +˃ >
> +˄ ^
> +ˆ ^
> +ˈ '
> +ˋ `
> +ː :
> +˖ +
> +˗ -
> +˜ ~
>
These aren't the combining codepoints. They're new substitutions defined in
r34 of the Latin-ASCII transliteration file. I had wondered about those,
too, and did some testing.

I don't think this is quite right.
>

However, you are correct that something isn't write. In testing why I was
getting a different output, I had reverted to the
generate_unaccent_rules.py BEFORE my changes. And then I applied my update
for the transliteration file format to the reverted version. The patch for
generate_unaccent_rules should still be good, but the generated rules file
didn't include the combining diacriticals. In generating that, I want to
double check some of the additions before re-submitting.

On Mon, 17 Dec 2018 at 23:57, Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> Could you also add some tests in contrib/unaccent/sql/unaccent.sql at
> the same time? That would be nice to check easily the extent of the
> patches proposed on this thread.

That makes sense. I'm happy to do that. Let me look at that file and see
how extensive the other changes (encoding and removal of special characters
would be).

Hugh

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Luis Carril 2018-12-18 13:41:04 Re: BUG #15552: Unexpected error in COPY to a foreign table in a transaction
Previous Message Etsuro Fujita 2018-12-18 12:48:59 Re: BUG #15552: Unexpected error in COPY to a foreign table in a transaction

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexey Kondratov 2018-12-18 14:07:08 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Previous Message Filip Rembiałkowski 2018-12-18 12:25:32 dropdb --force