From: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
---|---|
To: | tgl(at)sss(dot)pgh(dot)pa(dot)us |
Cc: | cpisto(at)rvweb(dot)com, pgsql-general(at)postgresql(dot)org |
Subject: | Re: lc_collate issue |
Date: | 2007-08-25 02:18:11 |
Message-ID: | 20070825.111811.35679240.t-ishii@sraoss.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
> Cody Pisto <cpisto(at)rvweb(dot)com> writes:
> > If initdb was done with a C locale, and thus lc_collate and friends
> > where all C, but the database and client encoding was set to UTF-8,
> > would postgres convert data on the fly from UTF-8(storage) to ASCII for
> > sorting or would things just blow up when a >1 byte character hit the mix?
>
> No, C locale just sorts the bytes. It won't "blow up". Whether it will
> give you a sort ordering you like for multibyte characters is a
> different question.
Yup.
For example, LATIN1 part of UTF-8 (UNICODE) is physicaly ordered same
as ISO 8859-1. So if you see the order of ISO 8859-1 is "natural",
then the sort order of UTF-8 is ok as well. However the order of CJK
part of UTF-8 is totally different from the original charcater sets
(almost random), you need to use convert() for converting UTF-8 to
original encoding to get "natural" sort order. I don't think you are
interested in CJK part, though.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2007-08-25 03:01:12 | Re: connect by service name in psql |
Previous Message | Stuart | 2007-08-25 01:35:59 | Re: connect by service name in psql |