Re: [HACKERS] UTF8 or Unicode

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: "Markus Bertheau ?" <twanger(at)bluetwanger(dot)de>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>, dpage(at)vale-housing(dot)co(dot)uk, oliver(at)opencloud(dot)com, zakkr(at)zf(dot)jcu(dot)cz, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] UTF8 or Unicode
Date: 2005-03-02 17:54:20
Message-ID: 11919.1109786060@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
>> The correct encoding name is "UTF-8".

> True, but Peter says the ANSI standard calls it UTF8 so that's what I
> used.

What SQL99 actually says is

- UTF8 specifies the name of a character repertoire that consists
of every character represented by The Unicode Standard Version
2.0 and by ISO/IEC 10646 UTF-8, where each character is encoded
using the UTF-8 encoding, occupying from 1 (one) through 6
octets.

That is, "UTF8" is an identifier chosen to refer to an encoding which
they know perfectly well is really called UTF-8. We should probably
follow the same convention of using UTF8 in code identifiers and UTF-8
in documentation. In particular, UTF_8 with an underscore is sanctioned
by nobody and should be avoided.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-03-02 18:16:10 Re: Vacuum time degrading
Previous Message Andreas Pflug 2005-03-02 17:35:45 Re: logging as inserts

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2005-03-02 18:16:27 Re: [pgsql-hackers-win32] [HACKERS] snprintf causes regression
Previous Message Stefan Hans 2005-03-02 17:35:01 typos in the docu