Re: More message encoding woes

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: More message encoding woes
Date: 2009-04-07 09:41:18
Message-ID: 49DB1FBE.3040001@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hiroshi Inoue wrote:
> Heikki Linnakangas wrote:
>> I just tried that, and it seems that gettext() does transliteration,
>> so any characters that have no counterpart in the database encoding
>> will be replaced with something similar, or question marks. Assuming
>> that's universal across platforms, and I think it is, using the empty
>> string should work.
>>
>> It also means that you can use lc_messages='ja' with
>> server_encoding='latin1', but it will be unreadable because all the
>> non-ascii characters are replaced with question marks. For something
>> like lc_messages='es_ES' and server_encoding='koi8-r', it will still
>> look quite nice.
>>
>> Attached is a patch I've been testing. Seems to work quite well. It
>> would be nice if someone could test it on Windows, which seems to be a
>> bit special in this regard.
>
> Unfortunately it doesn't seem to work on Windows.
>
> First any combination of valid lc_messages and non-existent encoding
> passes the test strcmp(gettext(""), "") != 0 .

Now that's strange. Can you check what gettext("") returns in that case
then?

> Second for example the combination of ja(lc_messages) and ISO-8859-1
> passes the the test but the test fails after I changed the last_trans
> lator part of ja message catalog to contain Japanese kanji characters.

Yeah, the inconsistency is not nice. In practice, though, if you try to
use an encoding that can't represent kanji characters with Japanese,
you're better off falling back to English than displaying strings full
of question marks. The same goes for all other languages as well, IMHO.
If you're going to fall back to English for some translations (and in
practice "some" is a pretty high percentage) because the encoding is
missing a character and transliteration is not working, you might as
well not bother translating at all.

If we add the dummy translations to all .po files, we could force
fallback-to-English in situations like that by including some or all of
the non-ASCII characters used in the language in the dummy translation.

I'm thinking of going ahead with this approach, without the dummy
translation, after we have resolved the first issue on Windows. We can
add the dummy translations later if needed, but I don't think anyone
will care.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2009-04-07 10:09:42 Re: More message encoding woes
Previous Message Peter Eisentraut 2009-04-07 09:38:09 Re: More message encoding woes