Re: main log encoding problem

From: Alexander Law <exclusion(at)gmail(dot)com>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: pgsql-general(at)postgresql(dot)org, ringerc(at)ringerc(dot)id(dot)au, yi(dot)codeplayer(at)gmail(dot)com, pgsql-bugs(at)postgresql(dot)org
Subject: Re: main log encoding problem
Date: 2012-07-19 06:37:49
Message-ID: 5007AB3D.3010501@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-general pgsql-hackers

Hello,

>> C. We have one logfile with UTF-8.
>> Pros: Log messages of all our clients can fit in it. We can use any
>> generic editor/viewer to open it.
>> Nothing changes for Linux (and other OSes with UTF-8 encoding).
>> Cons: All the strings written to log file should go through some
>> conversation function.
>>
>> I think that the last solution is the solution. What is your opinion?
> I am thinking about variant of C.
>
> Problem with C is, converting from other encoding to UTF-8 is not
> cheap because it requires huge conversion tables. This may be a
> serious problem with busy server. Also it is possible some information
> is lossed while in this conversion. This is because there's no
> gualntee that there is one-to-one-mapping between UTF-8 and other
> encodings. Other problem with UTF-8 is, you have to choose *one*
> locale when using your editor. This may or may not affect handling of
> string in your editor.
>
> My idea is using mule-internal encoding for the log file instead of
> UTF-8. There are several advantages:
>
> 1) Converion to mule-internal encoding is cheap because no conversion
> table is required. Also no information loss happens in this
> conversion.
>
> 2) Mule-internal encoding can be handled by emacs, one of the most
> popular editors in the world.
>
> 3) No need to worry about locale. Mule-internal encoding has enough
> information about language.
> --
>
I believe that postgres has such conversion functions anyway. And they
used for data conversion when we have clients (and databases) with
different encodings. So if they can be used for data, why not to use
them for relatively little amount of log messages?
And regarding mule internal encoding - reading about Mule
http://www.emacswiki.org/emacs/UnicodeEncoding I found:
/In future (probably Emacs 22), Mule will use an internal encoding which
is a UTF-8 encoding of a superset of Unicode. /
So I still see UTF-8 as a common denominator for all the encodings.
I am not aware of any characters absent in Unicode. Can you please
provide some examples of these that can results in lossy conversion?
?hoosing UTF-8 in a viewer/editor is no big deal too. Most of them
detect UTF-8 automagically, and for the others BOM can be added.

Best regards,
Aexander

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alexander Law 2012-07-19 07:23:39 Re: main log encoding problem
Previous Message Craig Ringer 2012-07-19 04:22:46 Re: main log encoding problem

Browse pgsql-general by date

  From Date Subject
Next Message Alban Hertroys 2012-07-19 06:53:29 Re: Trouble with NEW
Previous Message Craig Ringer 2012-07-19 06:34:15 Re: Segmentation fault

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2012-07-19 06:38:01 Re: Using pg_upgrade on log-shipping standby servers
Previous Message Amit Kapila 2012-07-19 05:33:39 Re: bgwriter, regression tests, and default shared_buffers settings