Re: Patch for collation using ICU

From: Palle Girgensohn <girgen(at)pingpong(dot)net>
To: John Hansen <john(at)geeknet(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: Patch for collation using ICU
Date: 2005-03-27 02:24:13
Message-ID: F10BA4C1FCA99DE337055B13@palle.girgensohn.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

--On lördag, mars 26, 2005 13.59.19 +1100 John Hansen <john(at)geeknet(dot)com(dot)au>
wrote:

>> - ORDER BY is case insensitive when using ICU. This might
>> break the SQL standard (?), but sure is nice :)
>
> This would mean that indexes are also case insensitive right?
> Which makes it a Bad Thing(tm).

Well, no, not really. Indices use collation rules, yes, but upper and lower
case strings are not considered *equal*, just "closer related". In
collation, characters are compared at four levels. See [1] for a good
explaination. This means that indices will use a case insensitive sort
order, but equality will not be different, so it shouldn't break anything.

>> - When the database is initialized using the C locale,
>> upper() and lower() normally does not work at all for
>> non-ASCII characters even if the database's encoding is say
>> LATIN1 or UNICODE. (does not work for me anyway, on FreeBSD,
>> and this is probably correct since the locale is still `C', I
>> believe?). The ICU patch changes nothing for the LATIN1 case,
>> since it does not act on single byte encodings, but for the
>> UNICODE representation, it works and does what I expect it
>> to, namely upper() and lower() neatly
>> upper- or lowercase diacritical characters, i.e. lower('ÅÄÖ')
>> -> 'åäö'.
>> This is a good thing, although I'm surprised that upper/lower
>> is dragged along with the LC_COLLATE fixation at initdb. I
>> never run initdb in the C locale, but only now do I realize
>> how broken that really is if you need to store anything else
>> than English :-)
>
> That is what I would have expected. However, it probably won't work for
> the more exotic cases, like turkish I, which depends on the locale.

Nope, Turkish must of course have its locale to for example handle their
special capital "i". Let's just say it is less broken :)

/Palle

[1]
<http://icu.sourceforge.net/userguide/Collate_Concepts.html#Comparison_Levels>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Palle Girgensohn 2005-03-27 03:12:19 Re: Patch for collation using ICU
Previous Message Stephan Szabo 2005-03-27 01:40:01 Re: Patch for collation using ICU