[bug fix] strerror() returns ??? in a UTF-8/C database with LC_MESSAGES=non-ASCII

From: "MauMau" <maumau307(at)gmail(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: [bug fix] strerror() returns ??? in a UTF-8/C database with LC_MESSAGES=non-ASCII
Date: 2013-09-06 13:40:10
Message-ID: 2782A2665E8342DF8695F396DBA80C88@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

I've been suffering from PostgreSQL's problems related to character encoding
for some time. I really wish to solve those problems, because they make
troubleshooting difficult. I'm going to propose fixes for them, and I would
appreciate if you could help release the official patches as soon as
possible.

The first issue is that the messages from strerror() become "???" in a
typical locale/encoding combination. I found this was reported in 2010, but
it was not solved.

problem with glibc strerror messages translation (was: Could not open file
pg_xlog/000000010....)
http://www.postgresql.org/message-id/87pqvezp3w.fsf@home.progtech.ru

The steps to reproduce the problem are:

$ export LANG=ja_JP.UTF-8
$ initdb -E UTF8 --no-locale --lc-messages=ja_JP
$ pg_ctl start
$ psql -d postgres -c "CREATE TABLE a (col int)"
$ psql -d postgres -c "SELECT pg_relation_filepath('a')"
... This outputs something like base/xxx/yyy
$ mv $PGDATA/base/xxx/yyy a
$ psql -d postgres -c "SELECT * FROM a"
... This outputs, in Japanese, a message meaning "could not open file
"base/xxx/yyy": ???".

The problem is that strerror() returns "???", which hides the cause of the
trouble.

The cause is that gettext() called by strerror() tries to convert UTF-8
messages obtained from libc.mo to ASCII. This is because postgres calls
setlocale(LC_CTYPE, "C") when it connects to the database.

Thus, I attached a patch (strerror_codeset.patch). This simple patch just
sets the codeset for libc catalog the same as postgres catalog. As noted in
the comment, I understand this is a kludge based on an undocumented fact
(the catalog for strerror() is libc.mo), and may not work on all
environments. However, this will help many people who work in non-English
regions. Please just don't reject this because of implementation cleanness.
If there is a better idea which can be implemented easily, I'd be happy to
hear that.

I'm also attaching another patch, errno_str.patch, which adds the numeric
value of errno to %m in ereport() like:

could not open file "base/xxx/yyy": errno=2: No such file or directory

When talking with operating system experts, numeric errno values are
sometimes more useful and easy to communicate than their corresponding
strings. This is a closely related but a separate proposal.

I want the first patch to be backported at least to 9.2.

Regards
MauMau

Attachment Content-Type Size
strerror_codeset.patch application/octet-stream 672 bytes
errno_str.patch application/octet-stream 1.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-09-06 14:13:23 Re: [RFC] Extend namespace of valid guc names
Previous Message Andres Freund 2013-09-06 13:38:29 Re: [RFC] Extend namespace of valid guc names