From: | Martijn van Oosterhout <kleptog(at)svana(dot)org> |
---|---|
To: | Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> |
Cc: | moseley(at)hank(dot)org, pgsql-general(at)postgresql(dot)org |
Subject: | Re: Mixing different LC_COLLATE and database encodings |
Date: | 2006-02-21 06:44:07 |
Message-ID: | 20060221064407.GA24481@svana.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Tue, Feb 21, 2006 at 10:27:15AM +0900, Tatsuo Ishii wrote:
> If you consider to allow only UTF-16 or whatever encoding in backend,
> I will strongly against the idea. We Japanese need those encodings
> native support. Converting those encodings with Unicode everytime when
> backend and forntend have conversations will be serious performance
> hit. Moreover the converion is known as not being roundtrip safe, that
> means some information will be lost during the conversion. The another
> point would be on disk format. UTF-16 will require more storage than
> local encodings. Probably UTF-8 will require more.
I didn't say that we only support utf-16 in the backend, I said that
when doing comparisons in a non-C locale, you have to convert to UTF-16
to use ICU. If you don't want to use it, don't, it's not going to be
required at any point. Just like currently with Win32, if you use UTF-8
it has to be converted to UTF-16 prior to string comparison.
The only time any of this is required is *sorting* and if you have an
index defined it acts as a cache for the sorted values. Ofcourse
there's a tradeoff but unless you're sorting large datasets all day I
doubt it'll be noticable.
If you're not sorting, none of this is relevent to you.
> I have a feeling that ICU is good for applications, but is not for
> DBMSs.
I think providing a system where users are able to select out of a
large range of possible collation orders and if necessary specify their
own is a worthy goal. Look at the complaints we get now and then of
people who choose en_US as their locale and are surprised when it gives
them a dictionary sort.
ICU allows users to take an existing collation and tweak it if it
doesn't quite match their expectations. You think this is not useful
for a DBMS?
Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.
From | Date | Subject | |
---|---|---|---|
Next Message | Chad | 2006-02-21 10:41:13 | Re: How do I use the backend APIs |
Previous Message | R, Rajesh (STSD) | 2006-02-21 06:31:17 | [PATCH] ipv6 support for getaddrinfo.c |