Quick Links

Re: NLS vs error processing, again

From:	Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
To:	tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc:	jw(dot)pgsql(at)sduept(dot)com, pgsql-bugs(at)postgresql(dot)org
Subject:	Re: NLS vs error processing, again
Date:	2006-04-05 03:20:47
Message-ID:	20060405.122047.98856262.t-ishii@sraoss.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

> JiangWei <jw(dot)pgsql(at)sduept(dot)com> writes:
> > LANG=zh_CN.UTF-8
> > [ set client_encoding to LATIN1 and provoke an error ]
>
> OK, I can reproduce the crash after initdb'ing with that LANG setting
> (in an nls-enabled build). The postmaster log fills with a whole lot
> of occurrences of
>
> ��: �� UTF-8 �� 0x00e9
> ��: �� UTF-8 �� 0x00e8
> ��: �� UTF-8 �� 0x00e8
> ��: �� UTF-8 �� 0x00e8
> ��: ERRORDATA_STACK_SIZE exceeded
>
> Tracing through the dump shows that the error-handling code is
> recursively producing this warning while trying to translate the word
> WARNING to LATIN1. The zh_CN.po file shows the translation as
>
> #: utils/error/elog.c:1909
> msgid "WARNING"
> msgstr "��"
>
> (which apparently is GB2312?)

It seems. zh_CN.po has the line:

"Content-Type: text/plain; charset=GB2312\n"

Which means at least someone who wrote the file intended to be it as
GB2312. However, please note that GB2312 is a character set, not an
encoding. The reality is that the file seems encoded in EUC-CN. Note
that I have confirmed this by just examining the bytes above
(��) are correct EUC-CN byte sequences. It is posibble
that the file is not written in EUC-CN, but I guess it's hardly
possible.

> and what's actually getting passed to
> utf8_to_iso8859_1() is
>
> (gdb) x/6o str
> 0x8b89d8: 0350 0255 0246 0345 0221 0212
>
> I have no idea if this is a correct UTF8 transliteration of the GB2312
> phrase --- can anyone confirm?

As fas as looking into utils/mb/Unicode/euc_cn_to_utf8.map, the
translation above seems to be correct. BTW, who does the translation
from EUC-CN to UTF-8? Maybe gettext()?
--
Tatsuo Ishii
SRA OSS, Inc. Japan

> But anyway, if this is Chinese then it's
> hardly surprising that there would be no LATIN1 equivalent. And then
> trying to report the problem gets us into a new instance of the same
> problem. Even the code that's supposed to stop error recursion doesn't
> get us out of it.
>
> It seems to me that there basically is no graceful solution to this sort
> of mismatch. It might be possible to kluge things so that we disable
> NLS once we've recursed too many times in error processing, but that's
> surely pretty ugly. What would be a lot more user-friendly would be to
> refuse the attempt to set client_encoding to something that can't handle
> our error message encoding, but I don't know what a reasonable set of
> restrictions would be.
>
> Comments?
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

In response to

NLS vs error processing, again (was Re: Composite Type with Domain) at 2006-04-04 14:41:13 from Tom Lane

Responses

Re: NLS vs error processing, again at 2006-04-05 03:57:03 from Tom Lane

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Tom Lane	2006-04-05 03:57:03	Re: NLS vs error processing, again
Previous Message	Tom Lane	2006-04-05 02:44:23	Re: NLS vs error processing, again (was Re: Composite Type with Domain)