Skip site navigation (1) Skip section navigation (2)

Re: Unicode UTF-8 table formatting for psql text output

From: Roger Leigh <rleigh(at)codelibre(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, "Brad T(dot) Sliger" <brad(at)sliger(dot)org>,pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>,Selena Deckelmann <selenamarie(at)gmail(dot)com>,Alvaro Herrera <alvherre(at)commandprompt(dot)com>,Roger Leigh <rleigh(at)debian(dot)org>
Subject: Re: Unicode UTF-8 table formatting for psql text output
Date: 2009-09-30 15:00:04
Message-ID: 20090930150004.GB4486@codelibre.net (view raw or flat)
Thread:
Lists: pgsql-hackers
On Tue, Sep 29, 2009 at 04:32:49PM -0400, Tom Lane wrote:
> Roger Leigh <rleigh(at)codelibre(dot)net> writes:
> >> C locale means POSIX behavior and nothing but.
> 
> > Indeed it does.  However, making LC_CTYPE be UTF-8 rather than
> > ASCII is both possible and still strictly conforming to the
> > letter of the standard.  There would be some collation and
> > other restrictions ("digit" and other character classes would
> > be contrained to the ASCII characters compared with other UTF-8
> > locales). However, any existing programs using ASCII would continue
> > to function without any changes to their behaviour.  The only
> > observable change will be that nl_langinfo(CODESET) will return
> > UTF-8, and it will be valid for programs to use UTF-8 encoded
> > text in formatted print functions, etc..
> 
> I really, really don't believe that that meets either the letter or
> the spirit of the C standard, at least not if you are intending to
> silently substitute LC_CTYPE=UTF8 when the program has specified
> C/POSIX locale.  (If this is just a matter of what the default
> LANG environment is, of course you can do anything.)

We have spent some time reading the relevant standards documents
(C, POSIX, SUSv2, SUSv3) and haven't found anything yet that would
preclude this.  While they all specify minimal requirements for
what the C locale character set must provide (and POSIX/SUS are the
most strict, specifying ASCII outright for each 0-127 codepoint),
these are the minimal requirements for the locale, and
implementation-specific extensions to ASCII are allowed, which would
therefore permit UTF-8.  Note that LC_CTYPE=C is not required to
specify ASCII in any standard (though POSIX/SUS require that it
must contain ASCII as a subset of the whole set).

The language in SUSv2 in fact explicitly states that this is
allowed.  In fact, I've seen documentation that some UNIX systems such
as HPUX already do have a UTF-8 C locale as an option.


Regards,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.

In response to

Responses

pgsql-hackers by date

Next:From: Andrew DunstanDate: 2009-09-30 15:03:05
Subject: Re: Unicode UTF-8 table formatting for psql text output
Previous:From: Tom LaneDate: 2009-09-30 14:57:32
Subject: Re: Unicode UTF-8 table formatting for psql text output

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group