From: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
---|---|
To: | Thom Brown <thom(at)linux(dot)com> |
Cc: | PGSQL Mailing List <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Unaccent characters |
Date: | 2012-04-20 17:43:29 |
Message-ID: | 1334943809.19045.10.camel@vanquo.pezone.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On fre, 2012-04-20 at 09:15 +0100, Thom Brown wrote:
> I had a look at the unaccent.rules file and noticed the following
> characters aren't properly converted:
>
> ß (U+00DF) An eszett represents a double-s "SS" but this replaces it
> with one "S". Shouldn't this be replace with "SS"?
Probably, but it certainly shouldn't be upper case.
> Æ (U+00C6) and æ (U+00E6) These doesn't have an accent, diacritic or
> anything added to a single latin character. It's simply a ligature of
> "A" and "E" or "a" and "e". If someone has the text "æther", I would
> imagine they'd be surprised at it being converted to "ather" instead
> of "aether".
It depends on what the point of this module is supposed to be. Doing
"unaccenting" usefully depends on language and context. For example, it
would be very reasonable to map æ to ae, but in a Scandinavian context,
æ is equivalent to ä, which is mapped to a, which is itself
questionable.
> Œ (U+0152) and œ (U+0153). Same as above. This is a ligature of "O"
> and "E" or "o" and "e". Except this time the unaccent module chooses
> the 2nd character instead of the 1st which is confusing.
That certainly seems wrong. It's also worth noting that while æ is in
some languages considered a separate letter, œ is generally just a
typographical ligature.
From | Date | Subject | |
---|---|---|---|
Next Message | Lonni J Friedman | 2012-04-20 17:51:11 | pg_basebackup issues |
Previous Message | Tom Lane | 2012-04-20 17:24:48 | Re: How to drop a temporary view? |