Re: UTF8 with BOM support in psql

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: UTF8 with BOM support in psql
Date: 2009-11-17 15:50:02
Message-ID: 29075.1258473002@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> I think I could support using the presence of the BOM as a fall-back
> indicator of encoding in absence of any other declaration. It seems to
> me, however, that the description above ignores the existence of
> encodings other than SQL_ASCII and UTF8.

Yeah. This entire proposal rests on the assumption that UTF8 is the
only encoding that really matters, and introducing a possibility of
breaking things for users of other encodings is acceptable damage.
I do not think that supporting a deprecated-by-standards behavior
is worth that.

Even assuming that we had consensus on a behavior that involved
silently changing client_encoding, I do not believe that it's practical
to implement it in an acceptable fashion. Just issuing a SET behind the
user's back will not work in a number of scenarios:

* We are inside a transaction when \i is called, and the file contains
a ROLLBACK.

* We are inside a failed transaction when \i is called --- the SET won't
even work at all.

* Same two cases inside a savepoint.

* The file contains a \c command.

If you expect that the previous client_encoding should be restored at
the end of the \i inclusion (as I certainly would) then you have the
first three hazards at file end as well, except that now the odds of
being inside a failed transaction are significantly higher. Also,
what if the file contained a SET CLIENT_ENCODING command itself?
How should that interact with this?

Lastly, a silent change of client_encoding would also affect the
encoding of notice and error messages that come out while the \i
file is running. I fail to find that non-astonishing, either.

I think that the only way this sort of behavior could be implemented
without a bunch of broken corner cases would be if we put the
responsibility of encoding conversion inside psql, so that switching its
idea of the encoding was just a local change rather than something it
had to ask the backend to do, and it could be careful to apply the
encoding only to the data coming from the \i file. Which is possible,
perhaps, but it hardly seems that slightly-more-convenient BOM handling
is worth it.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alex Hunsaker 2009-11-17 15:52:45 Re: Writeable CTE patch
Previous Message Robert Haas 2009-11-17 15:48:19 Re: Timezones (in 8.5?)