Re: garbage in psql -l

From: Roger Leigh <rleigh(at)codelibre(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: garbage in psql -l
Date: 2009-11-25 00:14:32
Message-ID: 20091125001431.GD14791@codelibre.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 24, 2009 at 05:43:00PM -0500, Tom Lane wrote:
> Roger Leigh <rleigh(at)codelibre(dot)net> writes:
> > On Tue, Nov 24, 2009 at 02:19:27PM -0500, Tom Lane wrote:
> >> I wonder whether the most prudent solution wouldn't be to prevent
> >> default use of linestyle=unicode if ~/.psqlrc hasn't been read.
>
> > This problem is caused when there's a mismatch between the
> > client encoding and the user's locale. We can detect this at
> > runtime and fall back to ASCII if we know they are incompatible.
>
> Well, no, that is *one* of the possible failure modes. I've hit others
> already in the short time that the patch has been installed. The one
> that's bit me most is that the locale environment seen by psql doesn't
> necessarily match what my xterm at the other end of an ssh connection
> is prepared to do --- which is something that psql simply doesn't have
> a way to detect. Again, this is something that's never mattered before
> unless one was really pushing non-ASCII data around, and even then it
> was often possible to be sloppy.

Sure, but this type of misconfiguration is entirely outside the
purview of psql. Everything else on the system, from man(1) to gcc
emacs and vi will be sending UTF-8 codes to your terminal for any
non-ASCII character they display. While psql using UTF-8 for its
tables is certainly exposing the problem, in reality it was already
broken, and it's not psql's "fault" for using functionality the
system said was available. It would equally break if you stored
non-ASCII characters in your UTF-8-encoded database and then ran
a SELECT query, since UTF-8 codes would again be sent to the
terminal.

For the specific case here, where the locale is KOI8-R, we can
determine at runtime that this isn't a UTF-8 locale and stay
using ASCII. I'll be happy to send a patch in to correct this
specific case.

At least on GNU/Linux, checking nl_langinfo(CODESET) is considered
definitive for testing which character set is available, and it's
the standard SUS/POSIX interface for querying the locale.

> I'd be more excited about finding a way to use linestyle=unicode by
> default if it had anything beyond cosmetic benefits. But it doesn't,
> and it's hard to justify ratcheting up the requirements for users to get
> their configurations exactly straight when that's all they'll get for it.

Bar the lack of nl_langinfo checking, once this is added we will go
out of our way to make sure that the system is capable of handling
UTF-8. This is, IMHO, the limit of how far i/any/ tool should go to
handle things. Worrying about misconfigured terminals, something
which is entirely the user's responsiblility, is I think a step too
far--going down this road means you'll be artificially limited to
ASCII, and the whole point of using nl_langinfo is to allow sensible
autoconfiguation, which almost all programs do nowadays. I don't
think it makes sense to "penalise" the majority of users with
correctly-configured systems because a small minority have a
misconfigured terminal input encoding. It is 2009, and all
contemporary systems support Unicode, and for the majority it is the
default.

Every one of the GNU utilities, plus most other free software,
localises itself using gettext, which in a UTF-8 locale, even
English locales, will transparently recode its output into the
locale codeset. This hasn't resulted in major problems for
people using these tools; it's been like this way for years now.

Regards,
Roger

--
.''`. Roger Leigh
: :' : Debian GNU/Linux http://people.debian.org/~rleigh/
`. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/
`- GPG Public Key: 0x25BFB848 Please GPG sign your mail.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Izmailov 2009-11-25 01:03:03 pg_attribute.attnum - wrong column ordinal?
Previous Message Tom Lane 2009-11-24 22:43:00 Re: garbage in psql -l