Re: UTF8 or Unicode

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, dpage(at)vale-housing(dot)co(dot)uk, oliver(at)opencloud(dot)com, zakkr(at)zf(dot)jcu(dot)cz, pgsql-hackers(at)postgresql(dot)org
Subject: Re: UTF8 or Unicode
Date: 2005-02-25 04:51:16
Message-ID: 200502250451.j1P4pHi06087@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Tatsuo Ishii wrote:
> I do not object the changing UNICODE->UTF-8, but all these discussions
> sound a little bit funny to me.
>
> If you want to blame UNICODE, you should blame LATIN1 etc. as
> well. LATIN1(ISO-8859-1) is actually a character set name, not an
> encoding name. ISO-8859-1 can be encoded in 8-bit single byte
> stream. But it can be encoded in 7-bit too. So when we refer to
> LATIN1(ISO-8859-1), it's not clear if it's encoded in 7/8-bit.

Wow, Tatsuo has a point here. Looking at encnames.c, I see:

"UNICODE", PG_UTF8

but also:

"WIN", PG_WIN1251
"LATIN1", PG_LATIN1

and I see conversions for those:

"iso88591", PG_LATIN1
"win", PG_WIN1251

so I see what he is saying. We are not consistent in favoring the
official names vs. the common names.

I will work on a patch that people can review and test.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2005-02-25 04:59:36 Re: Can we remove SnapshotSelf?
Previous Message Bruce Momjian 2005-02-25 04:33:39 Re: BUG #1466: syslogger issues

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2005-02-25 04:57:12 Re: Change < to -f in examples with input files
Previous Message Bruce Momjian 2005-02-25 04:33:39 Re: BUG #1466: syslogger issues