Re: More message encoding woes

From: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: More message encoding woes
Date: 2009-04-01 17:14:23
Message-ID: 49D3A0EF.8030505@tpf.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas wrote:
> Tom Lane wrote:
>> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>>> Tom Lane wrote:
>>>> Maybe use a special string "Translate Me First" that
>>>> doesn't actually need to be end-user-visible, just so no one sweats
>>>> over
>>>> getting it right in context.
>>
>>> Yep, something like that. There seems to be a magic empty string
>>> translation at the beginning of every po file that returns the
>>> meta-information about the translation, like translation author and
>>> date. Assuming that works reliably, I'll use that.
>>
>> At first that sounded like an ideal answer, but I can see a gotcha:
>> suppose the translation's author's name contains some characters that
>> don't convert to the database encoding. I suppose that would result in
>> failure, when we'd prefer it not to. A single-purpose string could be
>> documented as "whatever you translate this to should be pure ASCII,
>> never mind if it's sensible".
>
> I just tried that, and it seems that gettext() does transliteration, so
> any characters that have no counterpart in the database encoding will be
> replaced with something similar, or question marks.
> Assuming that's
> universal across platforms, and I think it is, using the empty string
> should work.
>
> It also means that you can use lc_messages='ja' with
> server_encoding='latin1', but it will be unreadable because all the
> non-ascii characters are replaced with question marks.

It doesn't occur in the current Windows environment. As for Windows
gnu gettext which we are using, we would see the original msgid when
iconv can't convert the msgstr to the target codeset.

set client_encoding to utf_8;
SET
show server_encoding;
server_encoding
-----------------
LATIN1
(1 row)

show lc_messages;
lc_messages
--------------------
Japanese_Japan.932
(1 row)

1;
ERROR: syntax error at or near "1"
LINE 1: 1;

OTOH when the sever encoding is utf8 then

set client_encoding to utf_8;
SET
show server_encoding;
server_encoding
-----------------
UTF8
(1 row)

show lc_messages;
lc_messages
--------------------
Japanese_Japan.932
(1 row)

1;
ERROR: "1"またはその近辺で構文エラー
LINE 1: 1; ^

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2009-04-01 17:20:28 Re: [HACKERS] string_to_array with empty input
Previous Message Tom Lane 2009-04-01 17:09:38 Re: [HACKERS] string_to_array with empty input