Re: main log encoding problem

From: Alexander Law <exclusion(at)gmail(dot)com>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: pgsql-general(at)postgresql(dot)org, ringerc(at)ringerc(dot)id(dot)au, yi(dot)codeplayer(at)gmail(dot)com, pgsql-bugs(at)postgresql(dot)org
Subject: Re: main log encoding problem
Date: 2012-07-19 08:21:45
Message-ID: 5007C399.6000405@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-general pgsql-hackers


>> And regarding mule internal encoding - reading about Mule
>> http://www.emacswiki.org/emacs/UnicodeEncoding I found:
>> /In future (probably Emacs 22), Mule will use an internal encoding
>> which is a UTF-8 encoding of a superset of Unicode. /
>> So I still see UTF-8 as a common denominator for all the encodings.
>> I am not aware of any characters absent in Unicode. Can you please
>> provide some examples of these that can results in lossy conversion?
> You can google by "encoding "EUC_JP" has no equivalent in "UTF8"" or
> some such to find such an example. In this case PostgreSQL just throw
> an error. For frontend/backend encoding conversion this is fine. But
> what should we do for logs? Apparently we cannot throw an error here.
>
> "Unification" is another problem. Some kanji characters of CJK are
> "unified" in Unicode. The idea of unification is, if kanji A in China,
> B in Japan, C in Korea looks "similar" unify ABC to D. This is a great
> space saving:-) The price of this is inablity of
> round-trip-conversion. You can convert A, B or C to D, but you cannot
> convert D to A/B/C.
>
> BTW, I'm not stick with mule-internal encoding. What we need here is a
> "super" encoding which could include any existing encodings without
> information loss. For this purpose, I think we can even invent a new
> encoding(maybe something like very first prposal of ISO/IEC
> 10646?). However, using UTF-8 for this purpose seems to be just a
> disaster to me.
>
Ok, maybe the time of real universal encoding has not yet come. Then we
maybe just should add a new parameter "log_encoding" (UTF-8 by default)
to postgresql.conf. And to use this encoding consistently within
logging_collector.
If this encoding is not available then fall back to 7-bit ASCII.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tatsuo Ishii 2012-07-19 08:28:05 Re: main log encoding problem
Previous Message Alexander Law 2012-07-19 08:12:17 Re: main log encoding problem

Browse pgsql-general by date

  From Date Subject
Next Message Tatsuo Ishii 2012-07-19 08:28:05 Re: main log encoding problem
Previous Message Alexander Law 2012-07-19 08:12:17 Re: main log encoding problem

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2012-07-19 08:28:05 Re: main log encoding problem
Previous Message Alexander Law 2012-07-19 08:12:17 Re: main log encoding problem