Re: PostgreSQL vs SQL/XML Standards

From: Chapman Flack <chap(at)anastigmatix(dot)net>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL vs SQL/XML Standards
Date: 2018-10-26 03:40:37
Message-ID: 5BD28CB5.4090500@anastigmatix.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/25/18 09:56, Alvaro Herrera wrote:
> Would you review Markus Winand patch here?
> https://postgr.es/m/8BDB0627-2105-4564-AA76-7849F028B96E@winand.at
> I think doing that would probably point out a couple of ways in which
> our XMLTABLE implementation is non-conformant, and then fixes it :-)
> I've been unsure as to applying it to all branches since 10 or just to
> master.

Well, modulo the key observation that it simply *is not* conformant
until it accepts XML Query expressions and uses the XPath 2.0 type system
and data model and the SQL/XML 2006+ casting rules ...

... and there is no ISO standard that says anything about how an XPath 1.0-
based quasi-XMLTABLE-ish function ought to behave, so it's hard to say
that anything this function does is right or wrong, per ISO ...

I think all of the changes in these patches do make it a more useful
quasi-XMLTABLE-ish function, as the pre-patch behaviors were less useful
(if not outright bewildering). And they produce output that better matches
what the XQuery-based ISO rules produce (for the subset of queries that
mean the same thing as XQuery and as XPath 1.0).

I also looked at the (not yet applied?)
XML-XPath-comments-processing-instructions-array-ind patch and I think
it, too, makes the behavior more useful. I did not actually take the
time to build a PostgreSQL with the patch, but I took the two added
regression queries, syntax-desugared them[1] and called the Saxon XQuery-
based "xmltable" example with them, and got the same expected results:

SELECT xmltable.* FROM
(SELECT '<root><element>a1a<!-- aaaa -->a2a<?aaaaa pi?> <!--z-->
bbbb<x>xxx</x>cccc</element></root>'::xml AS ".") AS p,
"xmltable"('/root', PASSING => p, COLUMNS => ARRAY[
'string(element)', 'string(element/comment()[2])',
'string(element/processing-instruction())', 'string(element/text()[1])',
'serialize(element)'])
AS (element text, cmnt text, pi text, t1 text, x text);

element | cmnt | pi | t1 |
x
----------------------+------+----+-----+---------------------------------------------------------------------------------
a1aa2a bbbbxxxcccc | z | pi | a1a | <element>a1a<!-- aaaa
-->a2a<?aaaaa pi?> <!--z--> bbbb<x>xxx</x>cccc</element>
(1 row)

SELECT xmltable.* FROM
(SELECT '<root><element>a1a<!-- aaaa -->a2a<?aaaaa pi?> <!--z-->
bbbb<x>xxx</x>cccc</element></root>'::xml AS ".") AS p,
"xmltable"('/root', PASSING => p, COLUMNS => ARRAY[
'string(element/text())', 'string(element/comment()[2])',
'string(element/processing-instruction())', 'string(element/text()[1])',
'serialize(element)'])
AS (element text, cmnt text, pi text, t1 text, x text);

ERROR: java.sql.SQLException: A sequence of more than one item is not
allowed as the first argument of fn:string() (text("a1a"), text("a2a"))

Agreement. Agreement is good. :)

So I think they are worth applying. I can't bring myself to a strong
opinion on whether they are or aren't worth backpatching; if it were
the function described by the standard, they'd be bugs and they would
be, but does making a non-standard function behave slightly more like
the standard function that it isn't count as a bug fix or an enhancement?

My overall feeling, at least in directing my own effort, is that I'd rather
spend time toward getting the real XQuery-based semantics in place somehow,
and ongoing enhancements to the XPath-1.0-based stuff feel more like
pouring treasure down a hole.

But these enhancements seem like good ones, and if there's interest in
patching a couple more, the "unexpected XPath object type {2,3}" in [2]
might be good candidates. That would be backpatchable, as the current
behavior clearly isn't useful.

One other thing: I think the commit message on the context-item patch
is really somewhat misleading: "According to the SQL standard, the context
of XMLTABLE's XPath row_expression is the document node of the XML input
document..." Really the standard says nothing of the sort. It is a
limitation of XPath 1.0 that the input even has to be a document at all.
In the real XMLTABLE, you could be passing a context item that is a
document node (of either 'document' or 'content' flavor), or a one-item
'sequence' holding an atomic type like a number or date, or even a naked
XML node like a PI or comment or attribute node you're more used to seeing
only inside a document. The real rule is just that the context item is
exactly the thing you passed (or a copy of it, when the rules say so).
It collapses in the XPath 1.0 case to having to be a document node, simply
because that's the only thing you can pass in.

What was wrong with the pre-patch code was that it wasn't just using
the 'doc' it was given, but actually calling xmlDocGetRootElement() on it
and setting the CI to one of its children. The patch just directly
assigns doc to xpathctx->node, which I would call correct, not because
it's a document node, but because it's the thing that was passed.

-Chap

[1] You might notice in addition to desugaring the XMLTABLE syntax, I
wrapped the text-returning column paths in string(), and wrapped the
xml-returning one in serialize() and changed its result column to text.
Those changes are just to work around the parts of the Saxon example that
aren't implemented yet, as explained in [3].

[2] https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#XMLTABLE

[3] https://tada.github.io/pljava/examples/saxon.html#Using_the_Saxon_examples

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-10-26 03:52:27 Re: Ordered Partitioned Table Scans
Previous Message Pavel Stehule 2018-10-26 03:16:39 Re: PostgreSQL vs SQL/XML Standards