Re: BUG #13440: unaccent does not remove all diacritics

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Michael Gradek <mike(at)busbud(dot)com>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #13440: unaccent does not remove all diacritics
Date: 2015-06-23 13:50:36
Message-ID: 5589642C.3000201@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 6/18/15 5:17 PM, Alvaro Herrera wrote:
> To me, conceptually what unaccent does is turn whatever junk you have
> into a very basic common alphabet (ascii); then it's very easy to do
> full text searches without having to worry about what accents the people
> did or did not use in their searches. If we say "okay, but that funny
> char is not an accent so let's leave it alone" then the charter doesn't
> sound so useful to me.

I think unaccent is one of those contrib things that are useful but not
really fully thought out and therefore won't ever become an official
core feature. It is what it is, and we can tweak it slightly, but
thinking too hard about what it "should" do won't lead anywhere.

If we wanted to do this "properly", we could do something like: perform
Unicode canonical decomposition, then strip out all combining
characters. I don't know how useful that is in practice, though. And
it won't "solve" issues such as German ß, which probably doesn't have a
one-size-fits-all solution.

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message nanaska_91 2015-06-23 16:09:48 BUG #13463: fatal 28000 no pg_hba.conf entry for host
Previous Message Peter Eisentraut 2015-06-23 13:42:31 Re: BUG #13440: unaccent does not remove all diacritics