Re: BUG #15548: Unaccent does not remove combining diacritical characters

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: hugh(at)whtc(dot)ca,pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15548: Unaccent does not remove combining diacritical characters
Date: 2018-12-13 16:26:48
Message-ID: 5d77cc08-d582-4f83-a17f-f2c992d123a9@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Tom Lane wrote:

> Hm, I thought the OP's proposal was just to make unaccent drop
> combining diacriticals independently of context, which'd avoid the
> combinatorial-growth problem.

In that case, this could be achieved by simply appending the
diacriticals themselves to unaccent.rules, since replacement of a
string by an empty string is already supported as a rule.
It doesn't seem like the current file has any of these, but from
https://www.postgresql.org/docs/11/unaccent.html :

"Alternatively, if only one character is given on a line, instances
of that character are deleted; this is useful in languages where
accents are represented by separate characters"

Incidentally we may want to improve this bit of doc to mention
explicitly the Unicode decomposed forms as a use case for
removing characters. In fact I wonder if that's not what it's
already trying to express, but confusing "languages" with "forms".

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Hugh Ranalli 2018-12-13 18:50:37 Re: BUG #15548: Unaccent does not remove combining diacritical characters
Previous Message Juan Toro 2018-12-13 16:21:43 problema version 10.6

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-12-13 16:48:33 Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock
Previous Message Bruce Momjian 2018-12-13 15:32:32 Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock