Re: BUG #15548: Unaccent does not remove combining diacritical characters

From: Hugh Ranalli <hugh(at)whtc(dot)ca>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Daniel Verite <daniel(at)manitou-mail(dot)org>, thomas(dot)munro(at)enterprisedb(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15548: Unaccent does not remove combining diacritical characters
Date: 2018-12-15 19:05:07
Message-ID: CAAhbUMO+fTZTwigDJ=tB3qFvHMe-xAJO5QpsFPH6Vb2oDYAU6w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Sat, 15 Dec 2018 at 13:44, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Hm. Something funny is going on here. When I fetch the two reference
> files from the URLs cited in the script, and do
>

> python2 generate_unaccent_rules.py --unicode-data-file UnicodeData.txt
> --latin-ascii-file Latin-ASCII.xml >newrules
>
> I get something that's bit-for-bit the same as what's in unaccent.rules.
> So there's clearly a platform difference between here and there.
>
> I'm using Python 2.6.6, which is what ships with RHEL6; have not tried
> it on anything newer.
>
Well, that's embarrassing. When I looked I couldn't see anything that
looked platform specific. I'm on Python 2.7.6, which shipped with Mint 17.
We use other versions of 2.7 on our production platforms. I'll take another
look, and check the URLs I am using.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Hugh Ranalli 2018-12-15 21:03:33 Re: BUG #15548: Unaccent does not remove combining diacritical characters
Previous Message Tom Lane 2018-12-15 19:03:58 Re: BUG #15548: Unaccent does not remove combining diacritical characters

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Dolgov 2018-12-15 19:15:12 Re: Pluggable Storage - Andres's take
Previous Message Tom Lane 2018-12-15 19:03:58 Re: BUG #15548: Unaccent does not remove combining diacritical characters