Re: BUG #13440: unaccent does not remove all diacritics

From: Léonard Benedetti <benedetti(at)mlpo(dot)fr>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #13440: unaccent does not remove all diacritics
Date: 2016-03-12 03:44:27
Message-ID: 56E3909B.5060702@mlpo.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

12/03/2016 04:02, Peter Eisentraut wrote:
> On 3/11/16 1:16 PM, Tom Lane wrote:
>> =?UTF-8?Q?L=c3=a9onard_Benedetti?= <benedetti(at)mlpo(dot)fr> writes:
>>> Despite all that, I think this transition to Python 3 is wise, it is
>>> available since 2008. Python 2 is legacy and its last version (2.7) is a
>>> “end-of-life releaseâ€.
>> Doesn't matter. We support both Python 2 and 3, and this script must
>> do so as well, else it's not getting committed. Any desupport for
>> Python 2 in PG is very far away; no one has even suggested we consider
>> it yet.
> This script is only run occasionally when the unaccent data needs to be
> updated from Unicode data, so it's not really that important what
> language and version it's written in. That said, the mentioned reason
> for changing this to Python 3 is so that one can include Unicode
> characters into the source text, which I find undesirable in general
> (for PostgreSQL source code) and not very useful in this particular
> case. I think the script can be kept in Python 2 style. Making it
> upward compatible with Python 3 can be a separate (small) project.
>
I completely agree. This script does not have to be run regularly (as
mentioned, just when the Unicode standard changes or characters of
transliterator). Moreover, even when it should be done, users can wait
for the next version of PostgreSQL where the rules file has already been
updated. So, it is indeed a one-time shot, and the language of this
script is not so important.

However, concerning support for Unicode characters into the source code,
version of Python does not change much (both versions support it). The
change to Python 3 was rather done to anticipate the end of life of
Python 2. But as has been pointed out by Tom Lane, it's not going to
happen shortly (according to the PEP 0373: “The current plan is to
support [Python 2] for at least 10 years from the initial 2.7 release.
This means there will be bugfix releases until 2020.”). Furthermore, as
I stated above, adaptation to Python 3 was quite trivial, and could be
made easily in due course.

So I think we can keep just a version for Python 2 for now. If everyone
agrees, I'll update the files and patch.

Léonard Benedetti

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2016-03-12 04:06:03 Re: BUG #13440: unaccent does not remove all diacritics
Previous Message Peter Eisentraut 2016-03-12 03:02:37 Re: BUG #13440: unaccent does not remove all diacritics