Re: XMLDocument (SQL/XML X030)

From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Chapman Flack <jcflack(at)acm(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: XMLDocument (SQL/XML X030)
Date: 2025-01-20 19:56:28
Message-ID: f44eb6cc-e0be-4041-a374-6231b9dcaefb@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Chap,

Thanks for the thorough explanation! 

On 20.01.25 20:09, Chapman Flack wrote:
>> PostgreSQL does not support the RETURNING SEQUENCE or RETURNING CONTENT
>> clauses explicitly. Instead, it implicitly uses RETURNING CONTENT[2] in
>> functions that require it. Since RETURNING CONTENT implies that the
>> output is a well-formed XML document (e.g., single-rooted),
> In fact, you can't infer single-root-element-ness from RETURNING CONTENT,
> according to the standard. Single-root-element-ness is checked by the
> IS DOCUMENT predicate, and by XMLPARSE and XMLSERIALIZE when they specify
> DOCUMENT. But it isn't checked or implied by the XMLDOCUMENT constructor.
>
> That amounts to a bit of unfortunate punning on the word DOCUMENT,
> but so help me that's what's in the standard.

Yeah, the term DOCUMENT seems a bit misleading in this context.

>
> It may help to think in terms of the hierarchy of XML types that the
> 2006 standard introduced (cribbed here from [3]):
>
> SEQUENCE
> |
> (?sequence of length 1, a document node)
> |
> CONTENT(ANY)----------------.----------------(?every element
> | | conforms to a
> (?every element has (?no extraneous schema)
> xdt:untyped and !nilled, nodes) |
> every attribute has | |
> xdt:untypedAtomic) DOCUMENT(ANY) CONTENT(XMLSCHEMA)
> | |
> CONTENT(UNTYPED) (?whole thing is valid
> | according to schema)
> (?no extraneous nodes) |
> | DOCUMENT(XMLSCHEMA)
> DOCUMENT(UNTYPED)
>
> where the condition (?no extraneous nodes) is shorthand for SQL/XML's
> more precise "whose `children` property has exactly one XQuery element
> node, zero or more XQuery comment nodes, and zero or more XQuery
> processing instruction nodes".
>
> So that (?no extraneous nodes) condition is required for any of
> the XML(DOCUMENT...) types. When you relax that condition, you have
> an XML(CONTENT...) type.
>
> The XMLDOCUMENT constructor is so named because it constructs what
> corresponds to an XQuery document node—which actually corresponds to
> the XML(CONTENT...) SQL/XML types, and does not enforce having a
> single root element:
>
> "This data model is more permissive: a Document Node may be empty,
> it may have more than one Element Node as a child, and it also
> permits Text Nodes as children."[4]

Thanks a lot for pointing that out! I guess it's clear now.

>
> So in terms of the SQL/XML type hierarchy, what you get back from
> XMLDOCUMENT ... RETURNING CONTENT will have one of the XML(CONTENT...)
> types (whether it's CONTENT(ANY) or CONTENT(UNTYPED) is left to the
> implementation).
>
> If you then want to know if it is single-rooted, you can apply the
> IS DOCUMENT predicate, or try to cast it to an XML(DOCUMENT...) type.
>
> (And if you use XMLDOCUMENT ... RETURNING SEQUENCE, then you get a
> value of type XML(SEQUENCE). The sequence has length 1, a document
> node, making it safely castable to XML(CONTENT(ANY)), but whether
> you can cast it to an XML(DOCUMENT...) type will depend on what
> children that document node has.)
>
> Long story short, an XMLDOCUMENT constructor that enforced having
> a single root element would be nonconformant.
>

If I understand correctly, the compliant approach would be to always
treat the input expression as CONTENT:

|PG_RETURN_XML_P(xmlparse((text *) data, XMLOPTION_DOCUMENT, true));|

Is that right?"

>
>> 1 - https://www.ibm.com/docs/en/db2/11.1?topic=constructors-document-node
>> 2 - https://www.postgresql.org/docs/17/xml-limits-conformance.html
> 3 -
> https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#SQL.2FXML:2003_contrasted_with_SQL.2FXML_since_2006
> 4 - https://www.w3.org/TR/2010/REC-xpath-datamodel-20101214/#DocumentNode
>

Best, Jim

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-01-20 20:09:00 Re: tzdata 2025a and timestamptz.out
Previous Message Bruce Momjian 2025-01-20 19:48:53 Re: attndims, typndims still not enforced, but make the value within a sane threshold