Re: [bug fix] multibyte messages are displayed incorrectly on the client

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, MauMau <maumau307(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, <alvherre(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [bug fix] multibyte messages are displayed incorrectly on the client
Date: 2014-06-23 13:57:22
Message-ID: 53A83242.9010503@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04/05/2014 07:56 AM, Tom Lane wrote:
> "MauMau" <maumau307(at)gmail(dot)com> writes:
>>> Then, as a happy medium, how about disabling message localization only if
>>> the client encoding differs from the server one? That is, compare the
>>> client_encoding value in the startup packet with the result of
>>> GetPlatformEncoding(). If they don't match, call
>>> disable_message_localization().
>
> AFAICT this is not what was agreed to in this thread. It puts far too
> much credence in the server-side default for client_encoding, which up to
> now has never been thought to be very interesting; indeed I doubt most
> people bother to set it at all. The reason that this issue is even on
> the table is that that default is too likely to be wrong, no?
>
> Also, whatever possessed you to use pg_get_encoding_from_locale to
> identify the server's encoding? That's expensive and seems fairly
> unlikely to yield the right answer. I don't remember offhand where we
> keep the postmaster's idea of what encoding messages should be in, but I'm
> fairly sure it's stored explicitly somewhere. Or if it isn't, we can for
> sure do better than recalculating it during every connection attempt.
>
> Having said all that, though, I'm unconvinced that this cure isn't worse
> than the disease. Somebody claimed upthread that no very interesting
> messages would be delocalized by a change like this, but that's complete
> nonsense: in particular, *every* message associated with client
> authentication will be sent in English if we go down this path. Given
> the nearly complete lack of complaints in the many years that this code
> has worked like this, I'm betting that most people will find a change
> like this to be a net reduction in friendliness.
>
> Given the changes here to extract client_encoding from the startup packet
> ASAP, I wonder whether the right thing isn't just to set the client
> encoding immediately when we do that. Most application libraries pass
> client encoding in the startup packet anyway (libpq certainly does).

Based on Tom's comments above, I'm marking this as returned with
feedback in the commitfest. I agree that setting client_encoding as
early as possible seems like the right thing to do.

Earlier in this thread, MauMau pointed out that we can't do encoding
conversions until we have connected to the database because you need to
read pg_conversion for that. That's because we support creating custom
conversions with CREATE CONVERSION. Frankly, I don't think anyone cares
about that feature. If we just dropped the CREATE/DROP CONVERSION
feature altogether and hard-coded the conversions we have, there would
be close to zero complaints. Even if you want to extend something around
encodings and conversions, the CREATE CONVERSION interface is clunky.
Firstly, conversions are per-database, and even schema-qualified, which
just seems like an extra complication. You'll most likely want to modify
the conversion across the whole system. Secondly, rather than define a
new conversion between encodings, you'll likely want to define a whole
new encoding with conversions to/from existing encodings, but you can't
do that anyway without hacking the source code.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-06-23 14:09:49 Re: replication identifier format
Previous Message Stephen Frost 2014-06-23 13:10:52 Re: Use a signal to trigger a memory context dump?