Re: Mixing different LC_COLLATE and database encodings

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Cc: moseley(at)hank(dot)org, pgsql-general(at)postgresql(dot)org
Subject: Re: Mixing different LC_COLLATE and database encodings
Date: 2006-02-21 06:44:07
Message-ID: 20060221064407.GA24481@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Feb 21, 2006 at 10:27:15AM +0900, Tatsuo Ishii wrote:
> If you consider to allow only UTF-16 or whatever encoding in backend,
> I will strongly against the idea. We Japanese need those encodings
> native support. Converting those encodings with Unicode everytime when
> backend and forntend have conversations will be serious performance
> hit. Moreover the converion is known as not being roundtrip safe, that
> means some information will be lost during the conversion. The another
> point would be on disk format. UTF-16 will require more storage than
> local encodings. Probably UTF-8 will require more.

I didn't say that we only support utf-16 in the backend, I said that
when doing comparisons in a non-C locale, you have to convert to UTF-16
to use ICU. If you don't want to use it, don't, it's not going to be
required at any point. Just like currently with Win32, if you use UTF-8
it has to be converted to UTF-16 prior to string comparison.

The only time any of this is required is *sorting* and if you have an
index defined it acts as a cache for the sorted values. Ofcourse
there's a tradeoff but unless you're sorting large datasets all day I
doubt it'll be noticable.

If you're not sorting, none of this is relevent to you.

> I have a feeling that ICU is good for applications, but is not for
> DBMSs.

I think providing a system where users are able to select out of a
large range of possible collation orders and if necessary specify their
own is a worthy goal. Look at the complaints we get now and then of
people who choose en_US as their locale and are surprised when it gives
them a dictionary sort.

ICU allows users to take an existing collation and tweak it if it
doesn't quite match their expectations. You think this is not useful
for a DBMS?

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Chad 2006-02-21 10:41:13 Re: How do I use the backend APIs
Previous Message R, Rajesh (STSD) 2006-02-21 06:31:17 [PATCH] ipv6 support for getaddrinfo.c