Re: [bug fix] strerror() returns ??? in a UTF-8/C database with LC_MESSAGES=non-ASCII

From: Noah Misch <noah(at)leadboat(dot)com>
To: MauMau <maumau307(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [bug fix] strerror() returns ??? in a UTF-8/C database with LC_MESSAGES=non-ASCII
Date: 2013-09-10 02:28:47
Message-ID: 20130910022847.GB219994@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 10, 2013 at 05:42:06AM +0900, MauMau wrote:
> From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
>> Noah Misch <noah(at)leadboat(dot)com> writes:
>>> ... I think
>>> MauMau's original bind_textdomain_codeset() proposal was on the right
>>> track.
>>
>> It might well be. My objection was to the proposal for back-patching it
>> when we have little idea of the possible side-effects.

Agreed.

> We are using 9.1/9.2 and 9.2 is probably dominant, so I would be relieved
> with either of the following choices:
>
> 1. Take the approach that doesn't use bind_textdomain_codeset("libc")
> (i.e. the second version of errno_str.patch) for 9.4 and older releases.
>
> 2. Use bind_textdomain_codeset("libc") (i.e. take strerror_codeset.patch)
> for 9.4, and take the non-bind_textdomain_codeset approach for older
> releases.

I like (2), at least at a high level. The concept of errno_str.patch is safe
enough to back-patch. One can verify that it only changes behavior when
strerror() returns NULL, an empty string, or something that begins with '?'.
I can't see resenting the change when that has happened.

Note that you can work around the problem today by linking PostgreSQL with a
better iconv() implementation.

Question-mark-damaged messages are not limited to strerror(). A combination
like lc_messages=ja_JP, encoding=LATIN1, lc_ctype=en_US will produce question
marks for PG and libc messages even with the bind_textdomain_codeset("libc")
change. Is it worth doing anything about that? That one looks self-inflicted
in comparison to the lc_messages=ja_JP, encoding=UTF8, lc_ctype=C case.

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-09-10 03:29:59 Re: Protocol forced to V2 in low-memory conditions?
Previous Message Peter Eisentraut 2013-09-10 02:27:26 Re: Re: Privileges for INFORMATION_SCHEMA.SCHEMATA (was Re: [DOCS] Small clarification in "34.41. schemata")