From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "Daniel Verite" <daniel(at)manitou-mail(dot)org> |
Cc: | hugh(at)whtc(dot)ca, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #15548: Unaccent does not remove combining diacritical characters |
Date: | 2018-12-13 15:05:42 |
Message-ID: | 10200.1544713542@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
"Daniel Verite" <daniel(at)manitou-mail(dot)org> writes:
> PG Bug reporting form wrote:
>> ... For example, A
>> followed by U+0300 displays À. However, unaccent is not removing
>> these accents.
> Short of having the input normalized by the application, ISTM that the
> best solution would be to provide functions to do it in Postgres, so
> you'd just write for example:
> unaccent(unicode_NFC(string))
That might be worthwhile, but it seems independent of this issue.
> Otherwise unaccent.rules can be customized. You may add replacements
> for letter+diacritical sequences that are missing for the languages
> you have to deal with. But doing it in general for all diacriticals
> multiplied by all base characters seems unrealistic.
Hm, I thought the OP's proposal was just to make unaccent drop
combining diacriticals independently of context, which'd avoid the
combinatorial-growth problem.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Juan Toro | 2018-12-13 16:21:43 | problema version 10.6 |
Previous Message | Daniel Verite | 2018-12-13 13:19:51 | Re: BUG #15548: Unaccent does not remove combining diacritical characters |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2018-12-13 15:08:03 | Re: 'infinity'::Interval should be added |
Previous Message | Tom Lane | 2018-12-13 15:01:37 | Re: alternative to PG_CATCH |