xml type and encodings

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: xml type and encodings
Date: 2007-01-14 22:39:42
Message-ID: 200701142339.42297.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

We need to decide on how to handle encoding information embedded in xml
data that is passed through the client/server encoding conversion.

Here is an example:

Client encoding is A, server encoding is B. Client sends an xml datum
that looks like this:

INSERT INTO table VALUES (xmlparse(document '<?xml version="1.0"
encoding="C"?><content>...</content>'));

Assuming that A, B, and C are all distinct, this could fail at a number
of places.

I suggest that we make the system ignore all encoding declarations in
xml data. That is, in the above example, the string would actually
have to be encoded in client encoding B on the client, would be
converted to A on the server and stored as such. As far as I can tell,
this is easily implemented and allowed by the XML standard.

The same would be done on the way back. The datum would arrive in
encoding B on the client. It might be implementation-dependent whether
the datum actually contains an XML declaration specifying an encoding
and whether that encoding might read A, B, or C -- I haven't figured
that out yet -- but the client will always be required to consider it
to be B.

What should be done above the binary send/receive functionality?
Looking at the send/receive functions for the text type, they
communicate all data in the server encoding, so it seems reasonable to
do this here as well.

Comments?

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthew T. O'Connor 2007-01-14 22:49:27 Re: Autovacuum improvements
Previous Message Neil Conway 2007-01-14 22:38:50 Re: [HACKERS] NaN behavior