XML - DOCTYPE element - documentation suggestion

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: GENERAL <pgsql-general(at)postgresql(dot)org>
Subject: XML - DOCTYPE element - documentation suggestion
Date: 2010-06-17 18:43:22
Message-ID: 4C1A6CCA.30203@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi all

I've been working with XML storage in Pg and was puzzled by the fact
that Pg appears to refuse to store a document with a DOCTYPE declaration
- it was interpreting it as a regular element and rejecting it.

This turns out to be because Pg parses XML as a fragment (ie option
CONTENT) when casting, and XML fragments cannot have a doctype.
Unfortunately the error is ... unhelpful ... and the documentation
neglects to mention this issue. Hence my post.

I didn't see anything about this in the FAQ or in the docs for the XML
datatype
(http://www.postgresql.org/docs/current/interactive/datatype-xml.html)
and was half-way through writing this post when I found a helpful
message on the list:

http://www.mail-archive.com/pgsql-general(at)postgresql(dot)org/msg119713.html

that hinted the way. Even then it took me a while to figure out that you
can't specify DOCUMENT or CONTENT on the XML type its self, but must
specify it while parsing instead and use a CHECK constraint if you want
to require storage of whole documents in a field.

The xml datatype documentation should probably mention that whole
documents must be loaded with an XMLPARSE(DOCUMENT 'doc_text_here), they
cannot just be cast from text to xml as happens when you pass an xml
document as text to a parameter during an INSERT. This should probably
appear under "CREATING XML VALUES" in:

http://www.postgresql.org/docs/current/static/datatype-xml.html

... and probably deserves mention in a new "CAVEATS" or "NOTES" section
too, as it' *will* catch people out even if they R TFM.

I'd expect this to work:

CREATE TABLE test_xml ( doc xml );

INSERT INTO test_xml ( doc ) VALUES (
$$<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE test SYSTEM 'test.dtd'><test>dummy content</test>$$
);

... but it fails with:

ERROR: invalid XML content
LINE 2: $$<?xml version="1.0" encoding="utf-8"?>
^
DETAIL: Entity: line 2: parser error : StartTag: invalid element name
<!DOCTYPE test SYSTEM 'test.dtd'><test>dummy content</test>
^

though xmllint (from libxml) is quite happy with the document. This had
me quite confused for a while.

--
Craig Ringer

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2010-06-17 18:53:50 Re: postgres crash SOS
Previous Message Marvin S. Addison 2010-06-17 18:14:12 Excessive Deadlocks On Concurrent Inserts to Shared Parent Row