Re: BUG #7493: Postmaster messages unreadable in a Windows console

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Alexander Law <exclusion(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: BUG #7493: Postmaster messages unreadable in a Windows console
Date: 2013-02-10 23:47:30
Message-ID: 16160.1360540050@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-general pgsql-hackers

Noah Misch <noah(at)leadboat(dot)com> writes:
> Following some actual testing, I see that we treat postgresql.conf values as
> byte sequences; any reinterpretation as encoded text happens later. Hence,
> contrary to my earlier suspicion, your patch does not make that situation
> worse. The present situation is bad; among other things, current_setting() is
> a vector for injecting invalid text data. But unconditionally validating
> postgresql.conf values in the platform encoding would not be an improvement.
> Suppose you have a UTF-8 platform encoding and KOI8R databases. You may wish
> to put KOI8R strings in a GUC, say search_path. That's possible today; if we
> required that postgresql.conf conform to the platform encoding and no other,
> it would become impossible. This area warrants improvement, but doing so will
> entail careful design.

The key problem, ISTM, is that it's not at all clear what encoding to
expect the incoming data to be in. I'm concerned about trying to fix
that by assuming it's in some "platform encoding" --- for one thing,
while that might be a well-defined concept on Windows, I don't believe
it is anywhere else.

If we knew that postgresql.conf was stored in, say, UTF8, then it would
probably be possible to perform encoding conversion to get string
variables into the database encoding. Perhaps we should allow some
magic syntax to tell us the encoding of a config file?

file_encoding = 'utf8' # must precede any non-ASCII in the file

There would still be a lot of practical problems to solve, like what to
do if we fail to convert some string into the database encoding. But at
least the problems would be somewhat well-defined.

While we're thinking about this, it'd be nice to fix our handling (or
rather lack of handling) of encoding considerations for database names,
user names, and passwords. I could imagine adding some sort of encoding
marker to connection request packets, which could fix the don't-know-
the-encoding problem as far as incoming data is concerned. But how
shall we deal with storing the strings in shared catalogs, which have to
be readable from multiple databases possibly of different encodings?

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Heikki Linnakangas 2013-02-11 12:11:07 Re: BUG #7865: Unexpected error code on insert of duplicate to composite primary key
Previous Message John R Pierce 2013-02-10 23:30:33 Re:

Browse pgsql-general by date

  From Date Subject
Next Message Modulok 2013-02-11 00:11:32 Can you create aliases in the psql shell?
Previous Message Andrew Taylor 2013-02-10 23:36:33 Re: var/log/postgresql deletion mystery Ubuntu 12.10

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-02-11 01:17:32 Re: performance regression in 9.2 CTE with SRF function
Previous Message Peter Geoghegan 2013-02-10 23:45:19 Re: pgbench --startup option