Re: Unicode UTF-8 table formatting for psql text output

From: Roger Leigh <rleigh(at)codelibre(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, "Brad T(dot) Sliger" <brad(at)sliger(dot)org>, pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, Selena Deckelmann <selenamarie(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Roger Leigh <rleigh(at)debian(dot)org>
Subject: Re: Unicode UTF-8 table formatting for psql text output
Date: 2009-09-30 14:41:11
Message-ID: 20090930144111.GA4486@codelibre.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 29, 2009 at 04:28:57PM -0400, Tom Lane wrote:
> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> > On Tue, 2009-09-29 at 12:01 -0400, Tom Lane wrote:
> >> The bigger question is exactly how we expect this stuff to interact with
> >> pg_regress' --no-locale switch. We already do clear all these variables
> >> when --no-locale is specified. I am wondering just what --locale is
> >> supposed to do, and whether selectively lobotomizing the LC stuff has
> >> any real use at all.
>
> > We should do the LANG or LC_CTYPE thing only on the client,
> > unconditionally. The --no-locale/--locale options should primarily
> > determine what the temporary server uses.
>
> Well, that seems fairly reasonable, but it's going to require some
> refactoring of pg_regress. The initialize_environment function
> determines what happens in both the client and the temp server.

Two possible approaches to fix the tests are as follows:

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index f2f9603..74cdaa2 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -711,8 +711,7 @@ initialize_environment(void)
* is actually called.)
*/
unsetenv("LANGUAGE");
- unsetenv("LC_ALL");
- putenv("LC_MESSAGES=C");
+ putenv("LC_ALL=C");

/*
* Set multibyte as requested

Here we just force the locale to C. This does have the disadvantage
that --no-locale is made redundant, and any tests which are dependent
upon locale (if any?) will be run in the C locale.

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index f2f9603..65fb49a 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -712,6 +712,7 @@ initialize_environment(void)
*/
unsetenv("LANGUAGE");
unsetenv("LC_ALL");
+ putenv("LC_CTYPE=C");
putenv("LC_MESSAGES=C");

/*

Here we set LC_CTYPE to C in addition to LC_MESSAGES (and for much the
same reasons). However, if you test on non-C locales to check for
issues with other locale codesets, those tests are all going to be
forced to use ASCII. Is this a problem in practice?

From the POV of my patch, it's working as designed: if the locale
codeset is UTF-8 it's outputting UTF-8. But, due to the way the
test machinery is looking at the output, this is breaking the tests.
I'm not sure what I can do with my patch to make it behave differently
that is both compatible with its intended use and not break the tests--
this is really just breaking an assumption in the testing code that
the test output will always be ASCII.

Forcing the LC_CTYPE to C will force ASCII output and work around this
problem with the tests. Another approach would be to let psql know
it's being run in a test environment with a PG_TEST or some other
environment variable which we can check for and use to turn off UTF-8
output if set. Would that be better?

Regards,
Roger

--
.''`. Roger Leigh
: :' : Debian GNU/Linux http://people.debian.org/~rleigh/
`. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/
`- GPG Public Key: 0x25BFB848 Please GPG sign your mail.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2009-09-30 14:47:57 Re: Unicode UTF-8 table formatting for psql text output
Previous Message Alvaro Herrera 2009-09-30 14:34:29 Re: TODO item: Allow more complex user/database default GUC settings