Encoding conversions in psql

From: Mathijs Brands <mathijs(at)ilse(dot)net>
To: pgsql-hackers list <pgsql-hackers(at)postgresql(dot)org>
Subject: Encoding conversions in psql
Date: 2004-01-08 14:21:19
Message-ID: 20040108142119.GA13264@ilse.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Howdy,

Can anyone explain to me when psql tries to convert between encodings?
It seems to disregard encodings set with SET CLIENT_ENCODING.

The following reproduces the behaviour I'm seeing:

1. create an UNICODE database

2. run the following:
set client_encoding to latin1;
create table bla(a text);
insert into bla values('meëep');

3. try the following from psql:
Welcome to psql 7.3.4, the PostgreSQL interactive terminal.

Type: \copyright for distribution terms
\h for help with SQL commands
\? for help on internal slash commands
\g or terminate with semicolon to execute query
\q to quit

mathijs=# select * from bla;
a
-------
meëep
(1 row)

mathijs=# set client_encoding = latin1;
SET
mathijs=# select * from bla;
a
------
meep
(1 row)

mathijs=# \encoding latin1
mathijs=# select * from bla;
a
-------
meëep
(1 row)

After setting CLIENT_ENCODING, the middle character gets dropped. To me
it seems like psql is considering the data it gets from the server as
UTF8, tries to interpret it as UTF8, sees the ë (which is indeed an
invalid UTF8 character) and drops it.

My question is: why does psql seem to think it's receiving UTF8 data
-after- I've changed the client_encoding. I've checked with a network
sniffer that results returned with or without using \encoding (as
expected) are the same. Is this behaviour a bug? If not, it does not
seem very obvious to me; I would expect psql to keep track of the
encoding set between the server and the client.

Cheers,

Mathijs

Browse pgsql-hackers by date

  From Date Subject
Next Message Shachar Shemesh 2004-01-08 20:04:56 OLE DB driver
Previous Message Stephen Frost 2004-01-08 13:18:15 Segfault in 7.4.1 (and 7.3.4) during vacuum analyze