Re: encoding of PostgreSQL messages

From: "Karsten Hilbert" <Karsten(dot)Hilbert(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: encoding of PostgreSQL messages
Date: 2008-12-31 16:57:29
Message-ID: 20081231165729.284240@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net> writes:
> > On Mon, Dec 29, 2008 at 09:07:14AM -0300, Alvaro Herrera wrote:
> >> And I'm now wondering if we should delay initializing the translation
> >> stuff until after client_encoding has been reported.
>
> > Or else
>
> > - just don't pass those messages through gettext so they are
> > always in 7 bit ASCII English
>
> What's the difference? The user-visible result would be the same
> AFAICS. (One or the other might be less messy internally, but I'm
> not sure which offhand.)

That was the reason for the suggestion: perhaps less messy and surely lower impact on the existing
code as it would not mean moving code later in the initialization but rather just removing the
gettext wrappers around a few strings. No difference in the result.

The difference to my other suggestion (no translation vs. translation but then replacing
characters > 127 by, say '?' or a space) is:

I could *assume* a given encoding, namely 7 bit ASCII. Or rather I could assume
that I can display the message as "something pretty similar to what the original message said,
perhaps without umlauts and accents but still recognizable in the local language".

Now, surely, I could dig down the layers to where "my application space" receives the message
from PostgreSQL and filter there. It is, however, good to have some knowledge of the encoding
where knowledge can be had.

The concrete problem is this: I connect to PostgreSQL from Python. Let's assume PG is set to German.
If the wrong password is supplied the PG error message string contains an umlaut. This is passed to
libpq, which in turn passes it to the C part of psycopg2 which then turns this into an exception. An
exception, by default in Python, is printed to the console, which may be in any encoding incompatible
with the latin1 the PG message happens to be in. Thus, printing the PG message may or may not fail
due to Unicode de-/encoding errors.

The solution is to find the right layer to take control of the encoding but this is eventually only possible
if the encoding is *known*. Thus the plea for "7-bit-ascii English by default until the encoding *can* be
known". Going to "7-bit-ascii filter of the original by default until the encoding can be known" only
tries to preserve a bit more of the original language. I may be wrong in feasibility.

Thanks for considering,
Karsten
--
Sensationsangebot verlängert: GMX FreeDSL - Telefonanschluss + DSL
für nur 16,37 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K1308T4569a

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Peter Eisentraut 2008-12-31 18:02:00 Re: encoding of PostgreSQL messages
Previous Message Tom Lane 2008-12-31 16:23:06 Re: encoding of PostgreSQL messages