Re: Encoding problems in PostgreSQL with XML data

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Encoding problems in PostgreSQL with XML data
Date: 2004-01-09 20:44:14
Message-ID: 3FFF129E.6020109@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Perhaps the document should be stored in canonical form. See
http://www.w3.org/TR/xml-c14n

I think I agree with Rod's opinion elsewhere in this thread. I guess the
"philosophical" question is this: If 2 XML documents with different
encodings have the same canonical form, or perhaps produce the same DOM,
are they equivalent? Merlin appears to want to say "no", and I think I
want to say "yes".

cheers

andrew

Merlin Moncure wrote:

>Peter Eisentraut wrote:
>
>
>>The central problem I have is this: How do we deal with the fact that
>>an XML datum carries its own encoding information?
>>
>>
>
>Maybe I am misunderstanding your question, but IMO postgres should be
>treating xml documents as if they were binary data, unless the server
>takes on the role of a parser, in which case it should handle
>unspecified/unknown encodings just like a normal xml parser would (and
>this does *not* include changing the encoding!).
>
>According to me, an XML parser should not change one bit of a document,
>because that is not a 'parse', but a 'transformation'.
>
>
>
>>Rewriting the <?xml?> declaration seems like a workable solution, but
>>
>>
>it
>
>
>>would break the transparency of the client/server encoding conversion.
>>Also, some people might dislike that their documents are being changed
>>as they are stored.
>>
>>
>
>Right, your example begs the question: why does the server care what the
>encoding of the documents is (perhaps indexing)? ZML validation is a
>standardized operation which the server (or psql, I suppose) can
>subcontract out to another application.
>
>Just a side thought: what if the xml encoding type was built into the
>domain type itself?
>create domain xml_utf8 ...
>Which allows casting, etc. which is more natural than an implicit
>transformation.
>
>Regards,
>Merlin
>
>---------------------------(end of broadcast)---------------------------
>TIP 8: explain analyze is your friend
>
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shachar Shemesh 2004-01-09 20:58:47 Re: OLE DB driver
Previous Message Merlin Moncure 2004-01-09 20:04:11 Re: Encoding problems in PostgreSQL with XML data