Skip site navigation (1) Skip section navigation (2)

BUG #2685: Wrong charset of server messages on client [PATCH]

From: "Sergiy Vyshnevetskiy" <serg(at)vostok(dot)net>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #2685: Wrong charset of server messages on client [PATCH]
Date: 2006-10-10 14:55:29
Message-ID: 200610101455.k9AEtTTd085210@wwwmaster.postgresql.org (view raw or flat)
Thread:
Lists: pgsql-bugs
The following bug has been logged online:

Bug reference:      2685
Logged by:          Sergiy Vyshnevetskiy
Email address:      serg(at)vostok(dot)net
PostgreSQL version: 8.1
Operating system:   FreeBSD-6 stable
Description:        Wrong charset of server messages on client [PATCH]
Details: 

DESCRIPTION:

PostgreSQL backend uses gettext() to localize its messages. The charset of
localized messages is determined by LC_CTYPE by default.

Then the message is processed through sprintf-like mechanism (with database
data as possible arguments) and fed to send_message_to_frontend(), that
converts data from _database_charset_(!) to client charset.

If LC_CTYPE is not the same as (at least binary compatible to) database
charset, then client gets garbage characters in server messages. If database
charset is UTF-8, then cluster may recusively generate "invalid byte
sequence for encoding" errors till it fills up
errordata[ERRORDATA_STACK_SIZE], then it panics.

SOLUTION:

Convert server messages to database charset.

PATCH:

--- src/backend/utils/mb/mbutils.c.o0 Tue Oct 10 11:51:13 2006              
   
+++ src/backend/utils/mb/mbutils.c  Tue Oct 10 11:49:22 2006                
   
@@ -615,6 +615,7 @@                                                         
   
  DatabaseEncoding = &pg_enc2name_tbl[encoding];                            
   
  Assert(DatabaseEncoding->encoding == encoding);                           
   
 #ifdef USE_ICU                                                             
   
+
bind_textdomain_codeset("postgres",(&pg_enc2iananame_tbl[encoding])->name); 
 
  ucnv_setDefaultName((&pg_enc2iananame_tbl[encoding])->name);              
   
 #endif                                                                     
   
 }                                                                          
   



This, however, uncovers another bug: PostgreSQL dumps the messages into
stderr/syslog as-is, without converting database data from database charset
to charset from LC_MESSAGES. After this patch it will do so with message
text too. The fix should be trivial - set up a conversion from database
charset to server charset. I will post a patch for it later.

NOTE:

I used pg_enc2iananame_tbl instead of pg_enc2name_tbl, because gettext
doesn't accept many 

Possible TODO:
Change PostgreSQL charset names to IANA-standard names.

Responses

pgsql-bugs by date

Next:From: Tom LaneDate: 2006-10-10 14:58:41
Subject: Re: BUG #2684: Memory leak in libpq
Previous:From: Milen A. RadevDate: 2006-10-10 10:22:35
Subject: BUG #2684: Memory leak in libpq

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group