XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated with wrong context

From: Markus Winand <markus(dot)winand(at)winand(dot)at>
To: pgsql-hackers(at)postgresql(dot)org
Subject: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated with wrong context
Date: 2018-03-27 11:52:26
Message-ID: 0684A598-002C-42A2-AE12-F024A324EAE4@winand.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I’ve found two issues with XML/XPath integration in PostgreSQL. Two patches are attached. I’ve just separated them because the second one is an incompatible change.

* Patch 0001: Accept TEXT and CDATA nodes in XMLTABLE’s column_expression.

> Column_expressions that match TEXT or CDATA nodes must return
> the contents of the node itself, not the content of the
> non-existing childs (i.e. the empty string).

The following query returns a wrong result in master:

FROM (VALUES ('<xml>text</xml>'::xml)
, ('<xml><![CDATA[<cdata>]]></xml>'::xml)
) d(x)
COLUMNS "node()" TEXT PATH 'node()'
) t

x | node()
<xml>text</xml> |
<xml><![CDATA[<cdata>]]></xml> |

(the “node()” column is empty)

The correct result can be obtained with patch 0001 applied:

x | node()
<xml>text</xml> | text
<xml><![CDATA[<cdata>]]></xml> | <cdata>

The patch adopts existing tests to guard against future regressions.

* Patch 0002: Set correct context for XPath evaluation.

> According to the ISO standard, the context of XMLTABLE's XPath
> row_expression is the document node of the XML[1] - not the root node.
> The respective code is shared with non-ISO functions such as xpath
> and xpath_exists so that the change also affects these functions.

It’s an incompatible change - even the regression tests need to be adjusted because they
test for the “wrong” behaviour. The documentation, on the other hand, does neither
document the behaviour explicitly, no does it show any example that breaks due to this patch.

The alternatives to this patch are (1) documenting the current standard-breaking behaviour
and (2) fixing the context only for ISO functions. Personally, I don’t like either of them.

I’ve checked the libxml2 code to see what’s the proper way to set the context to the
document node. It’s indeed just “(xmlNodePtr)ctxt->doc”.

See: https://github.com/GNOME/libxml2/blob/7abec671473b837f99181442d59edd0cc2ee01d1/xpath.c#L14306


[1] SQL/XML:2011, 6.18 General Rule 1aii2.

Attachment Content-Type Size
0001-Accept-TEXT-and-CDATA-nodes-in-XMLTABLE-s-column_exp.patch application/octet-stream 9.0 KB
0002-Set-correct-context-for-XPath-evaluation.patch application/octet-stream 4.3 KB
unknown_filename text/plain 7 bytes


Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2018-03-27 11:58:11 pg_class.reltuples of brin indexes
Previous Message Eren Başak 2018-03-27 11:50:35 Re: [HACKERS] Optional message to user when terminating/cancelling backend