Re: More message encoding woes

From: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: More message encoding woes
Date: 2009-04-07 10:22:47
Message-ID: 49DB2977.7070506@tpf.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas wrote:
> Hiroshi Inoue wrote:
>> Heikki Linnakangas wrote:
>>> I just tried that, and it seems that gettext() does transliteration,
>>> so any characters that have no counterpart in the database encoding
>>> will be replaced with something similar, or question marks. Assuming
>>> that's universal across platforms, and I think it is, using the empty
>>> string should work.
>>>
>>> It also means that you can use lc_messages='ja' with
>>> server_encoding='latin1', but it will be unreadable because all the
>>> non-ascii characters are replaced with question marks. For something
>>> like lc_messages='es_ES' and server_encoding='koi8-r', it will still
>>> look quite nice.
>>>
>>> Attached is a patch I've been testing. Seems to work quite well. It
>>> would be nice if someone could test it on Windows, which seems to be
>>> a bit special in this regard.
>>
>> Unfortunately it doesn't seem to work on Windows.
>>
>> First any combination of valid lc_messages and non-existent encoding
>> passes the test strcmp(gettext(""), "") != 0 .
>
> Now that's strange. Can you check what gettext("") returns in that case
> then?

Translated but not converted string. I'm not sure if it's a bug or not.
I can see no description what should be returned in such case.

>> Second for example the combination of ja(lc_messages) and ISO-8859-1
>> passes the the test but the test fails after I changed the last_trans
>> lator part of ja message catalog to contain Japanese kanji characters.
>
> Yeah, the inconsistency is not nice. In practice, though, if you try to
> use an encoding that can't represent kanji characters with Japanese,
> you're better off falling back to English than displaying strings full
> of question marks. The same goes for all other languages as well, IMHO.
> If you're going to fall back to English for some translations (and in
> practice "some" is a pretty high percentage) because the encoding is
> missing a character and transliteration is not working, you might as
> well not bother translating at all.

What is wrong with checking if the codeset is valid using iconv_open()?

regards,
Hiroshi Inoue

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2009-04-07 10:30:37 Re: More message encoding woes
Previous Message Heikki Linnakangas 2009-04-07 10:09:42 Re: More message encoding woes