From: | Hugh Ranalli <hugh(at)whtc(dot)ca> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, thomas(dot)munro(at)enterprisedb(dot)com, Daniel Verite <daniel(at)manitou-mail(dot)org>, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #15548: Unaccent does not remove combining diacritical characters |
Date: | 2019-01-10 02:52:05 |
Message-ID: | CAAhbUMNZ0ooK6SzLNdkxzdBsQHOJf_rg_EjwoNL8QHTwQuriRw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
On Tue, 8 Jan 2019 at 22:53, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> I have been doing a bit more than a review by studying by myself the
> new format and the old format, and the way we could do things in the
> XML parsing part, and hacked the code by myself. On top of the
> incorrect URL for Latin-ASCII.xml, I have noticed as well that there
> should be only one block transforms/transform/tRule in the source, so
> I think that we should add an assertion on that as a sanity check. I
> have also changed the code to use splitlines(), which is more portable
> across platforms, and added an extra regression test for the new
> characters added to unaccent.rules. This does not close this thread
> but we can support the new format this way. I have also documented
> the way to browse the full set of releases for Latin-ASCII.xml, and
> precisely which version has been used for this patch.
>
> This does not close yet the part for diacritical characters, but
> supporting the new format is a step into this direction. What do
> you think?
>
HI Michael,
Thank you for putting so much effort into this. I think that looks great.
When I was doing this, I discovered that I could parse both pre- and post-
r29 versions, so I went with that, but I agree that there's probably no
good reason to do so.
And thank you for the information on splitlines; that's a method I've
overlooked. .split('\n') should be identical, if python is, as usual,
compiled with universal newlines support, but it's nice to have a method
guaranteed to work in all instances.
Best wishes,
Hugh
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2019-01-10 03:29:45 | Re: BUG #15577: Query returns different results when executed multiple times |
Previous Message | Masahiko Sawada | 2019-01-10 02:47:32 | Re: Is temporary functions feature official/supported? Found some issues with it. |
From | Date | Subject | |
---|---|---|---|
Next Message | David Fetter | 2019-01-10 03:13:44 | Re: BTW, have we got a commitfest manager for the January CF? |
Previous Message | Thomas Munro | 2019-01-10 02:24:19 | Re: Early WIP/PoC for inlining CTEs |