More message encoding woes

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: More message encoding woes
Date: 2009-03-30 12:52:37
Message-ID: 49D0C095.8000304@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

latin1db=# SELECT version();
version

-----------------------------------------------------------------------------------
PostgreSQL 8.3.7 on i686-pc-linux-gnu, compiled by GCC gcc (Debian
4.3.3-5) 4.3.3
(1 row)

latin1db=# SELECT name, setting FROM pg_settings where name like 'lc%'
OR name like '%encoding';
name | setting
-----------------+---------
client_encoding | utf8
lc_collate | C
lc_ctype | C
lc_messages | es_ES
lc_monetary | C
lc_numeric | C
lc_time | C
server_encoding | LATIN1
(8 rows)

latin1db=# SELECT * FROM foo;
ERROR: no existe la relación «foo»

The accented characters are garbled. When I try the same with a database
that's in UTF8 in the same cluster, it works:

utf8db=# SELECT name, setting FROM pg_settings where name like 'lc%' OR
name like '%encoding';
name | setting
-----------------+---------
client_encoding | UTF8
lc_collate | C
lc_ctype | C
lc_messages | es_ES
lc_monetary | C
lc_numeric | C
lc_time | C
server_encoding | UTF8
(8 rows)

utf8db=# SELECT * FROM foo;
ERROR: no existe la relación «foo»

What is happening is that gettext() returns the message in the encoding
determined by LC_CTYPE, while we expect it to return it in the database
encoding. Starting with PG 8.3 we enforce that the encoding specified in
LC_CTYPE matches the database encoding, but not for the C locale.

In CVS HEAD, we call bind_textdomain_codeset() in SetDatabaseEncoding()
which fixes that, but we only do it on Windows. In earlier versions we
called it on all platforms, but only for UTF-8. It seems that we should
call bind_textdomain_codeset on all platforms and all encodings.
However, there seems to be a reason why we only do it for Windows on CVS
HEAD: we need a mapping from our encoding ID to the OS codeset name, and
the OS codeset names vary.

How can we make this more robust?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gurjeet Singh 2009-03-30 13:04:03 Re: New trigger option of pg_standby
Previous Message Pavel Stehule 2009-03-30 12:34:43 fix - function call with variadic parameter for type "any"