Quick Links

Re: BUG #15548: Unaccent does not remove combining diacritical characters

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Daniel Verite" <daniel(at)manitou-mail(dot)org>
Cc:	hugh(at)whtc(dot)ca, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: BUG #15548: Unaccent does not remove combining diacritical characters
Date:	2018-12-13 15:05:42
Message-ID:	10200.1544713542@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs pgsql-hackers

"Daniel Verite" <daniel(at)manitou-mail(dot)org> writes:
> PG Bug reporting form wrote:
>> ... For example, A
>> followed by U+0300 displays À. However, unaccent is not removing
>> these accents.

> Short of having the input normalized by the application, ISTM that the
> best solution would be to provide functions to do it in Postgres, so
> you'd just write for example:
> unaccent(unicode_NFC(string))

That might be worthwhile, but it seems independent of this issue.

> Otherwise unaccent.rules can be customized. You may add replacements
> for letter+diacritical sequences that are missing for the languages
> you have to deal with. But doing it in general for all diacriticals
> multiplied by all base characters seems unrealistic.

Hm, I thought the OP's proposal was just to make unaccent drop
combining diacriticals independently of context, which'd avoid the
combinatorial-growth problem.

regards, tom lane

In response to

Re: BUG #15548: Unaccent does not remove combining diacritical characters at 2018-12-13 13:19:51 from Daniel Verite

Responses

Re: BUG #15548: Unaccent does not remove combining diacritical characters at 2018-12-13 16:26:48 from Daniel Verite

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Juan Toro	2018-12-13 16:21:43	problema version 10.6
Previous Message	Daniel Verite	2018-12-13 13:19:51	Re: BUG #15548: Unaccent does not remove combining diacritical characters

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2018-12-13 15:08:03	Re: 'infinity'::Interval should be added
Previous Message	Tom Lane	2018-12-13 15:01:37	Re: alternative to PG_CATCH