Re: A rough roadmap for internationalization fixes

From: Kurt Roeckx <Q(at)ping(dot)be>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: db(at)zigo(dot)dhs(dot)org, peter_e(at)gmx(dot)net, pgsql-hackers(at)postgresql(dot)org
Subject: Re: A rough roadmap for internationalization fixes
Date: 2003-11-25 18:13:36
Message-ID: 20031125181336.GA13791@ping.be
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 25, 2003 at 08:40:57PM +0900, Tatsuo Ishii wrote:
> > On Tue, 25 Nov 2003, Peter Eisentraut wrote:
> >
> > I've always thought unicode was enough to even represent Japanese. Then
> > the client encoding can be something else that we can convert to. In any
> > way, the encoding of the message catalog has to be known to the system so
> > it can be converted to the correct encoding for the client.
>
> I'm tired of telling that Unicode is not that perfect.

Maybe it should be explained what the problems really are,
instead of saying it "isn't perfect"?

From what I understand there is only a problem converting from
the "legacy" encoding to unicode, and the other way around, and
no problem if you stop doing the conversion.

The conversion problem is because what in an encoding is only
represented by 1 character can be several characters in unicode.

Some examples people might understand are:
- µ: In iso 8859-1 it's char 0xB5. In unicode it can be U+00B5 (micro
sign) or U+03BC (greek letter small mu)
- Å: ISO 8859-1: 0xC5. Unicode U+00C5 (latin capital letter a
with ring above) or U+212B (angstrom sign)
- The ohm sign vs the greek letter omega.
- Quotation marks: You have left double quote, right double
quote, and a few others.

> Another gottcha
> with Unicode is the UTF-8 encoding (currently we use) consumes 3
> bytes for each Kanji character, while other encodings consume only 2
> bytes. IMO 3/2 storage ratio could not be neglected for database use.

You can encode unicode in different ways, and UTF-8 is only one
of them. Is there a problem with using UCS-2 (except that it
would require more storage for ASCII)?

Kurt

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dennis Bjorklund 2003-11-25 18:27:58 Re: Function parameter names
Previous Message Alvaro Herrera 2003-11-25 17:58:25 Re: Considerations for lib64