Re: prevent encoding conversion recursive error

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
Cc: pgsql-patches(at)postgresql(dot)org, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: prevent encoding conversion recursive error
Date: 2005-08-09 02:21:28
Message-ID: 11386.1123554088@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

"Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu> writes:
> Yeah, it is not a very clean solution. Do you mean the general problem is
> "prevent recursive error reporting because of the error in transalting error
> message"?

> I put the image of the reporting email here:
> http://www.cs.toronto.edu/~zhouqq/encode.jpg

Actually, I believe the general problem is that the gettext software
is doing the wrong internal character-set conversion for translated
message texts.

I can get this same crash on a Linux machine if I have server encoding
= utf8 and client encoding = gb18030 and I set lc_messages = zh_TW
... but if I instead make lc_messages = zh_CN, no problem. The backend
zh_TW.po file contains

msgid "ignoring unconvertible UTF-8 character 0x%04x"
msgstr "UTF-80x%04x"

and if I read the header correctly, this is claimed to be in UTF8
encoding. So it ought to be delivered as-is when in a UTF8 database.
But tracing through the failure with gdb, I see that what is actually
delivered back from gettext() is

(gdb) p str
$1 = 0x82e8a74 "???UTF-80xd4da"
(gdb) x/32cx str
0x82e8a74: 0xba 0xf6 0xc2 0xd4 0x3f 0xb7 0xa8 0x3f
0x82e8a7c: 0x3f 0xb5 0xc4 0x55 0x54 0x46 0x2d 0x38
0x82e8a84: 0xd7 0xd6 0xd4 0xaa 0x30 0x78 0x64 0x34
0x82e8a8c: 0x64 0x61 0x00 0x7e 0x7f 0x7f 0x7f 0x7f
(gdb)

so some sort of conversion has taken place. I had initially initialized
the database with initdb --locale=zh_CN, which was interpreted by
Postgres as requesting EUC_CN encoding. I suspect the above is the
EUC_CN equivalent of the message text from the .po file, and that the
real problem is that gettext() has not been told the correct character
set to convert messages to.

ISTM we've seen this issue before and Peter had an idea how to fix it,
but I forget the details. Peter?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-08-09 02:34:14 Re: prevent encoding conversion recursive error
Previous Message Tom Lane 2005-08-09 00:29:34 Re: Simplifying wal_sync_method

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2005-08-09 02:34:14 Re: prevent encoding conversion recursive error
Previous Message Tom Lane 2005-08-09 00:06:13 Re: PL/pgSQL: SELECT INTO EXACT