Re: character encoding in StartupMessage

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, John DeSoi <desoi(at)pgedit(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: character encoding in StartupMessage
Date: 2006-02-28 16:14:17
Message-ID: 20060228161417.GE535@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 28, 2006 at 12:05:17PM -0300, Alvaro Herrera wrote:
> Martijn van Oosterhout wrote:
>
> > This may be the only solution. Converting everything to UTF-8 has
> > issues because some encodings are not roundtrip-safe (Enc -> UTF8 -> Enc
> > gives you a different string than you started with). There's probably
> > no encoding round-trip safe with every other encoding.
>
> Is this still true? If I remember clearly, Tatsuo-san had asserted that
> this was the case, but later he said there was some bug in our
> conversion routines or the conversion tables. So maybe now that those
> things are fixed (they are, aren't they?) there _is_ a safe roundtrip
> from anything to UTF8 and back.

I beleive so. If use the ICU Converter Explorer [1] to examine some of
the encodings we support, they have "Contains ambiguous aliases? TRUE".
This means that there are multiple converters that claim to support that
encoding, though they produce different results.

The UTF-8 and Unicode FAQ [2] also lists some issues with EUC-JP saying
that the converters had to be modified to make round-trip conversion
work. However, not all converters work the same.

Anyway, maybe it's not a big problem anymore. The ISO-2022 series is
definitly not round-trip compatable [3] but I don't think we support
them anyway. I think the only issue is if the mappings postgres uses
internally don't match what the user expects, but I don't think there's
much we can do about that...

[1] http://www-950.ibm.com/software/globalization/icu/demo/converters
[2] http://www.cl.cam.ac.uk/~mgk25/unicode.html
[3] http://www.cl.cam.ac.uk/~mgk25/ucs/iso2022-wc.html

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-02-28 16:16:27 Re: Dead Space Map
Previous Message Jim C. Nasby 2006-02-28 16:14:01 Re: Dead Space Map