psql weird behaviour with charset encodings

From: hernan gonzalez <hgonzalez(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: psql weird behaviour with charset encodings
Date: 2010-05-07 21:21:05
Message-ID: v2m48692c2d1005071421o21dd177dkaaab23ab4013f85f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

(Disclaimer: I've been using Postgresql for quite a long time, I
usually deal with non-ascii LATIN-9 characters ,
but that has never been a problem, until now)

My issue summarized: when psql is invoked from a user who has a locale
different from that of the database, the tabular output
is wrong for some text fields. The weird thing is that those text
fields are not just garbled, but empty. And more weird:
this does not happen in the expanded output format (\x). Apparently
it's not a terminal problem (I see all right after \x),
nor a client_encoding issue (idem). So...?

Details follow.
My scenario: Fedora 12, Postgresql 8.4.3 compiled from source.

Database encoding (global) LATIN9.
User postgres locale: LANG=en_US.iso885915,
User root locale LANG=en_US.UTF-8

When I connect from postgres user, all is right.
When I connect from root, it's not right... except with \x
Example (here last_name field has one non ascii character, N WITH TILDE) :

========================================================================

[root(at)myserver ~]# su - postgres
[postgres(at)myserver ~]$ psql db
psql (8.4.3)
db=# \set
...
ENCODING = 'LATIN9'
db=# select first_name,last_name,birth_date from persons where rid= 143;
 first_name  | last_name | birth_date
--------------+-----------+------------
Guillermo    | Calcaño   | 1996-09-30
db=# \x
db=# select first_name,last_name,birth_date from persons where rid= 143;
-[ RECORD 1 ]------------
first_name | Guillermo
last_name  | Calcaño
birth_date | 1996-09-30

[root(at)myserver ~]# /usr/local/pgsql/bin/psql -U postgres db
psql (8.4.3)
db=# \set
...
ENCODING = 'LATIN9'
db=# select first_name,last_name,birth_date from persons where rid= 143;
 first_name  | last_name | birth_date
--------------+-----------+------------
Guillermo    |    | 1996-09-30
(1 row)
db=# \x
db=# select first_name,last_name,birth_date from persons where rid= 143;
-[ RECORD 1 ]------------
first_name | Guillermo
last_name  | Calcaño
birth_date | 1996-09-30

==================================================================

It looks as it psql, in tabular form, needs to compute the lenght of
the field to output, and for this uses the user locale (not
the client_encoding, mind you, but the locale of the user that is
running the psql process). In case of mismatch,
it cannot decode the string and compute the lenght, so... it assumes
length=0 (?)
Is this the expect behaviour ? Has this behaviour changed recently?

--
Hernán J. González
http://hjg.com.ar/

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2010-05-07 21:43:02 Re: psql weird behaviour with charset encodings
Previous Message Jeff Ross 2010-05-07 20:53:32 Re: Question about joins, left outer and others

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-05-07 21:35:37 Re: beta to release
Previous Message Robert Haas 2010-05-07 20:49:26 Re: PATCH: Minor notes in CLUSTER page