Re: BUG #6742: pg_dump doesn't convert encoding of DB object names to OS encoding

From: Alexander Law <exclusion(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: BUG #6742: pg_dump doesn't convert encoding of DB object names to OS encoding
Date: 2012-07-25 11:54:23
Message-ID: 500FDE6F.3060202@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-general pgsql-hackers

Hello,
I would like to fix this bug, but it looks like it would be not one-line
patch.
Looking at the pg_dump code I see that the object names come through the
following chain:
1. pg_dump executes 'SELECT c.tableoid, c.oid, c.relname, ... ' and gets
the object_name with the encoding chosen for db connection/dump.
2. it invokes write_msg function or alike:
write_msg(NULL, "finding the columns and types of table \"%s\"\n",
tbinfo->dobj.name);
3. vwrite_msg localizes text message, but not the argument(s):
vfprintf(stderr, _(fmt), ap);
Here gettext (_) internally translates fmt to OS encoding (if it's
different from UTF-8 - encoding of a localized strings).

And I can see only a few solutions of the problem:
1. To convert the object name at the back-end, i.e. to modify all the
similar SELECT's as:
'SELECT c.tableoid, c.oid, c.relname, convert_to(c.relname,
'OS_ENCODING') AS locrelname, ...'
and then do write_msg(NULL, "finding the columns and types of table
\"%s\"\n", tbinfo->dobj.local_name);
The downside of this approach is that it requires rewriting all the
SELECT's for all the object. And it doesn't help us to write out any
other text from backend, such as localized backend error.

2. To setup another connection to backend with the OS encoding, and to
get all the object names through it. It looks insane too. And we have
the same problem with the localized backend errors coming on "main"
connection.

3. To make convert_to_os_encoding(text, encoding) function for a
frontend utilities. Unfortunately frontend can't use internal PostgreSQL
conversion functions, and modifying them to use through libpq looks
unfeasible.
So the only way to implement such function is to use another encoding
conversion framework (library).
And my question is - is it possible to include libiconv (add this
dependency) to the frontend utilities code?

4. To force users to use OS encoding as the Database encoding. Or to not
use non-ASCII characters in an db object names and to disable nls on
Windows completely. It doesn't look like a solution at all.

BTW, it's not the only one instance of the issue. For example, when I
try to use vacuumdb, I get completely unreadable messages:
http://oi48.tinypic.com/1c8j9.jpg
(blue marks what is in Russian or English, all the other text is gibberish).

Best regards,
Alexander

18.07.2012 12:51, Alexander Law wrote:
> Hello,
>
> The dump file itself is correct. The issue is only with the non-ASCII
> object names in pg_dump messages.
> The messages text (which is non-ASCII too) displayed consistently with
> right encoding (i.e. with OS encoding thanks to libintl/gettext), but
> encoding of db object names depends on the dump encoding and thus
> they're getting unreadable when different encoding is used.
> The same can be reproduced in Linux (where console encoding is UTF-8)
> when doing dump with Windows-1251 or Latin1 (for western european
> languages).
>
> Thanks,
> Alexander
>
>
> The following bug has been logged on the website:
>
> Bug reference: 6742
> Logged by: Alexander LAW
> Email address: exclusion(at)gmail(dot)com
> PostgreSQL version: 9.1.4
> Operating system: Windows
> Description:
>
> When I try to dump database with UTF-8 encoding in Windows, I get unreadable
> object names.
> Please look at the screenshot (http://oi50.tinypic.com/2lw6ipf.jpg). On the
> left window all the pg_dump messages displayed correctly (except for the
> prompt password (bug #6510)), but the non-ASCII object name is gibberish. On
> the right window (where dump is done with the Windows 1251 encoding (OS
> Encoding for Russian locale)) everything is right.
>
> Did you check the dump file using an editor that can handle UTF-8?
> The Windows console is not known for properly handling that encoding.
>
> Thomas
>
>
>
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message bert 2012-07-25 12:53:01 BUG #6761: unexpected behaviour of 'now'::timestamp
Previous Message jez.wain 2012-07-25 10:32:23 BUG #6760: make check fails on strings SQL T581 regex test

Browse pgsql-general by date

  From Date Subject
Next Message Andrew Hastie 2012-07-25 13:35:24 Re: PL/pgSQL - Help or advice please on using unbound cursors
Previous Message leo xu 2012-07-25 11:38:36 how to calculate or know seq_scan scan how many blocks every time

Browse pgsql-hackers by date

  From Date Subject
Next Message Christoph Berg 2012-07-25 14:04:45 Re: Notify system doesn't recover from "No space" error
Previous Message Tom Lane 2012-07-25 02:52:54 Re: canceling autovacuum task woes