BUG #2685: Wrong charset of server messages on client [PATCH]

From: "Sergiy Vyshnevetskiy" <serg(at)vostok(dot)net>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #2685: Wrong charset of server messages on client [PATCH]
Date: 2006-10-10 14:55:29
Message-ID: 200610101455.k9AEtTTd085210@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 2685
Logged by: Sergiy Vyshnevetskiy
Email address: serg(at)vostok(dot)net
PostgreSQL version: 8.1
Operating system: FreeBSD-6 stable
Description: Wrong charset of server messages on client [PATCH]
Details:

DESCRIPTION:

PostgreSQL backend uses gettext() to localize its messages. The charset of
localized messages is determined by LC_CTYPE by default.

Then the message is processed through sprintf-like mechanism (with database
data as possible arguments) and fed to send_message_to_frontend(), that
converts data from _database_charset_(!) to client charset.

If LC_CTYPE is not the same as (at least binary compatible to) database
charset, then client gets garbage characters in server messages. If database
charset is UTF-8, then cluster may recusively generate "invalid byte
sequence for encoding" errors till it fills up
errordata[ERRORDATA_STACK_SIZE], then it panics.

SOLUTION:

Convert server messages to database charset.

PATCH:

--- src/backend/utils/mb/mbutils.c.o0 Tue Oct 10 11:51:13 2006

+++ src/backend/utils/mb/mbutils.c Tue Oct 10 11:49:22 2006

@@ -615,6 +615,7 @@

DatabaseEncoding = &pg_enc2name_tbl[encoding];

Assert(DatabaseEncoding->encoding == encoding);

#ifdef USE_ICU

+
bind_textdomain_codeset("postgres",(&pg_enc2iananame_tbl[encoding])->name);

ucnv_setDefaultName((&pg_enc2iananame_tbl[encoding])->name);

#endif

}

This, however, uncovers another bug: PostgreSQL dumps the messages into
stderr/syslog as-is, without converting database data from database charset
to charset from LC_MESSAGES. After this patch it will do so with message
text too. The fix should be trivial - set up a conversion from database
charset to server charset. I will post a patch for it later.

NOTE:

I used pg_enc2iananame_tbl instead of pg_enc2name_tbl, because gettext
doesn't accept many

Possible TODO:
Change PostgreSQL charset names to IANA-standard names.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2006-10-10 14:58:41 Re: BUG #2684: Memory leak in libpq
Previous Message Milen A. Radev 2006-10-10 10:22:35 BUG #2684: Memory leak in libpq