Re: NLS vs error processing, again

From: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: jw(dot)pgsql(at)sduept(dot)com, pgsql-bugs(at)postgresql(dot)org
Subject: Re: NLS vs error processing, again
Date: 2006-04-05 03:20:47
Message-ID: 20060405.122047.98856262.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

> JiangWei <jw(dot)pgsql(at)sduept(dot)com> writes:
> > LANG=zh_CN.UTF-8
> > [ set client_encoding to LATIN1 and provoke an error ]
>
> OK, I can reproduce the crash after initdb'ing with that LANG setting
> (in an nls-enabled build). The postmaster log fills with a whole lot
> of occurrences of
>
> ������: ��������������������� UTF-8 ������ 0x00e9
> ������: ��������������������� UTF-8 ������ 0x00e8
> ������: ��������������������� UTF-8 ������ 0x00e8
> ������: ��������������������� UTF-8 ������ 0x00e8
> ���������������������������������: ERRORDATA_STACK_SIZE exceeded
>
> Tracing through the dump shows that the error-handling code is
> recursively producing this warning while trying to translate the word
> WARNING to LATIN1. The zh_CN.po file shows the translation as
>
> #: utils/error/elog.c:1909
> msgid "WARNING"
> msgstr "����"
>
> (which apparently is GB2312?)

It seems. zh_CN.po has the line:

"Content-Type: text/plain; charset=GB2312\n"

Which means at least someone who wrote the file intended to be it as
GB2312. However, please note that GB2312 is a character set, not an
encoding. The reality is that the file seems encoded in EUC-CN. Note
that I have confirmed this by just examining the bytes above
(����) are correct EUC-CN byte sequences. It is posibble
that the file is not written in EUC-CN, but I guess it's hardly
possible.

> and what's actually getting passed to
> utf8_to_iso8859_1() is
>
> (gdb) x/6o str
> 0x8b89d8: 0350 0255 0246 0345 0221 0212
>
> I have no idea if this is a correct UTF8 transliteration of the GB2312
> phrase --- can anyone confirm?

As fas as looking into utils/mb/Unicode/euc_cn_to_utf8.map, the
translation above seems to be correct. BTW, who does the translation
from EUC-CN to UTF-8? Maybe gettext()?
--
Tatsuo Ishii
SRA OSS, Inc. Japan

> But anyway, if this is Chinese then it's
> hardly surprising that there would be no LATIN1 equivalent. And then
> trying to report the problem gets us into a new instance of the same
> problem. Even the code that's supposed to stop error recursion doesn't
> get us out of it.
>
> It seems to me that there basically is no graceful solution to this sort
> of mismatch. It might be possible to kluge things so that we disable
> NLS once we've recursed too many times in error processing, but that's
> surely pretty ugly. What would be a lot more user-friendly would be to
> refuse the attempt to set client_encoding to something that can't handle
> our error message encoding, but I don't know what a reasonable set of
> restrictions would be.
>
> Comments?
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2006-04-05 03:57:03 Re: NLS vs error processing, again
Previous Message Tom Lane 2006-04-05 02:44:23 Re: NLS vs error processing, again (was Re: Composite Type with Domain)