Skip site navigation (1) Skip section navigation (2)

More message encoding woes

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: More message encoding woes
Date: 2009-03-30 12:52:37
Message-ID: 49D0C095.8000304@enterprisedb.com (view raw or flat)
Thread:
Lists: pgsql-hackers
latin1db=# SELECT version();
                                       version 

-----------------------------------------------------------------------------------
  PostgreSQL 8.3.7 on i686-pc-linux-gnu, compiled by GCC gcc (Debian 
4.3.3-5) 4.3.3
(1 row)

latin1db=# SELECT name, setting FROM pg_settings where name like 'lc%' 
OR name like '%encoding';
       name       | setting
-----------------+---------
  client_encoding | utf8
  lc_collate      | C
  lc_ctype        | C
  lc_messages     | es_ES
  lc_monetary     | C
  lc_numeric      | C
  lc_time         | C
  server_encoding | LATIN1
(8 rows)

latin1db=# SELECT * FROM foo;
ERROR:  no existe la relación «foo»

The accented characters are garbled. When I try the same with a database 
that's in UTF8 in the same cluster, it works:

utf8db=# SELECT name, setting FROM pg_settings where name like 'lc%' OR 
name like '%encoding';
       name       | setting
-----------------+---------
  client_encoding | UTF8
  lc_collate      | C
  lc_ctype        | C
  lc_messages     | es_ES
  lc_monetary     | C
  lc_numeric      | C
  lc_time         | C
  server_encoding | UTF8
(8 rows)

utf8db=# SELECT * FROM foo;
ERROR:  no existe la relación «foo»

What is happening is that gettext() returns the message in the encoding 
determined by LC_CTYPE, while we expect it to return it in the database 
encoding. Starting with PG 8.3 we enforce that the encoding specified in 
LC_CTYPE matches the database encoding, but not for the C locale.

In CVS HEAD, we call bind_textdomain_codeset() in SetDatabaseEncoding() 
which fixes that, but we only do it on Windows. In earlier versions we 
called it on all platforms, but only for UTF-8. It seems that we should 
call bind_textdomain_codeset on all platforms and all encodings. 
However, there seems to be a reason why we only do it for Windows on CVS 
HEAD: we need a mapping from our encoding ID to the OS codeset name, and 
the OS codeset names vary.

How can we make this more robust?

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Responses

pgsql-hackers by date

Next:From: Gurjeet SinghDate: 2009-03-30 13:04:03
Subject: Re: New trigger option of pg_standby
Previous:From: Pavel StehuleDate: 2009-03-30 12:34:43
Subject: fix - function call with variadic parameter for type "any"

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group