From: | "John Sidney-Woollett" <johnsw(at)wardbrook(dot)com> |
---|---|
To: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | johnsw(at)wardbrook(dot)com, pgsql-general(at)postgresql(dot)org |
Subject: | Re: Unicode + LC_COLLATE |
Date: | 2004-04-22 13:26:58 |
Message-ID: | 3487.192.168.0.64.1082640418.squirrel@mercury.wardbrook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Tom Lane said:
> C locale basically means "sort by the byte sequence values". It'll do
> something self-consistent, but maybe not what you'd like for UTF8
> characters.
OK, that explains that. I guess I will need to try it out to see what the
effect is on extended character sets.
>> Our database is UNICODE with LC_COLLATE=en_US.iso885915.
> Does that sort rationally at all? I should think you'd need to specify
> an LC_COLLATE setting that's designed for UTF8 encoding, not 8859-15.
Er..., actually the LC_COLLATE for the DB in question is C - I was looking
at the wrong database (wrong telnet session)! So your comments above apply
in this case.
> If you only ever store characters that are in 7-bit ASCII then none of
> this will affect you, and you can get away with broken combinations of
> encoding and locale. But if you'd like to sort characters outside the
> minimal ASCII set then you need to get it right ...
Tom, thanks for the answers above.
I guess if I have some time I should build some different DBs with
different combinations of encoding and collations and summarise my
findings using different types of data and sort/search commands, in case
anyone else has the same level of confusion that I do...
John Sidney-Woollett
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2004-04-22 13:31:04 | Re: Unicode + LC_COLLATE |
Previous Message | Priem, Alexander | 2004-04-22 13:23:43 | Re: Unicode problem ??? |