Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated with wrong context

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Markus Winand <markus(dot)winand(at)winand(dot)at>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated with wrong context
Date: 2018-06-21 20:27:30
Message-ID: 20180621202730.i6esu5wchi475suk@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I have pushed the patch now (in your original form rather than my later
formulation) -- let's see how the buildfarm likes it. There are (at
least) three issues remaining, as per below; Pavel, do you have any
insight on these?

First one is about array indexes not working sanely (I couldn't get this
to work in Oracle)

> > Also, array indexes behave funny. First let's add more XML comments
> > inside that number, and query for the subscripts:
> >
> > update xmldata set data = regexp_replace(data::text, '7<!--small country-->91', '<!--ah-->7<!--oh-->9<!--uh-->1')::xml;
> >
> > SELECT xmltable.*
> > FROM (SELECT data FROM xmldata) x,
> > LATERAL XMLTABLE('/ROWS/ROW'
> > PASSING data
> > COLUMNS
> > country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
> > size_text float PATH 'SIZE/text()',
> > size_text_1 float PATH 'SIZE/text()[1]',
> > size_text_2 float PATH 'SIZE/text()[2]',
> > "SIZE" float, size_xml xml PATH 'SIZE')
> > where size_text is not null;
> >
> > country_name │ size_text │ size_text_1 │ size_text_2 │ size_text_3 │ SIZE │ size_xml
> > ──────────────┼───────────┼─────────────┼─────────────┼─────────────┼──────┼───────────────────────────────────────────────────────
> > Singapore │ 791 │ 791 │ 91 │ 1 │ 791 │ <SIZE unit="km"><!--ah-->7<!--oh-->9<!--uh-->1</SIZE>
> > (1 fila)

The second one is about (lack of!) processing instructions and comments:

> Also, node() matching comments or processing instructions
> seems to be broken too:
>
> SELECT *
> FROM (VALUES ('<xml><!--comment--></xml>'::xml)
> , ('<xml><?pi content?></xml>'::xml)
> ) d(x)
> CROSS JOIN LATERAL
> XMLTABLE('/xml'
> PASSING x
> COLUMNS "node()" TEXT PATH 'node()'
> ) t
>
> x | node()
> ---------------------------+--------
> <xml><!--comment--></xml> |
> <xml><?pi content?></xml> |
> (2 rows)
>
> I can look into this, but it may take a while.

Compare the empty second columns with oracle behavior, which returns the
contents of the PI and the comment. As a script for
http://rextester.com/l/oracle_online_compiler

create table xmltb (data xmltype) \\
insert into xmltb values ('<xml><!--the comment is here--></xml>') \\
insert into xmltb values ('<xml><?pi php_stuff(); do_stuff("hello"); ?></xml>') \\
SELECT * FROM xmltb, XMLTABLE('/xml' PASSING data COLUMNS node varchar2(100) PATH 'node()') t \\
drop table xmltb \\

The third issue is the way we output comments when they're in a column
of type XML:

> > Note what happens if I change the type from text to xml in that
> > column:
> >
> > SELECT *
> > FROM (VALUES ('<xml>te<!-- ahoy -->xt</xml>'::xml)
> > , ('<xml><![CDATA[some <!-- really --> weird <stuff>]]></xml>'::xml)
> > ) d(x)
> > CROSS JOIN LATERAL
> > XMLTABLE('/xml'
> > PASSING x
> > COLUMNS "node()" xml PATH 'node()'
> > ) t;
> >
> > x │ node()
> > ───────────────────────────────────────────────────────────┼────────────────────────────────────────────────
> > <xml>te<!-- ahoy -->xt</xml> │ te ahoy xt
> > <xml><![CDATA[some <!-- really --> weird <stuff>]]></xml> │ some &lt;!-- really --&gt; weird &lt;stuff&gt;
> > (2 filas)
>
> The comment seems to be wrong.
>
> I guess it’s fine if the CDATA gets transformed in to an equivalent
> string using the XML entities. Yet, it might be better avoiding it.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-06-21 20:41:40 Re: Fast default stuff versus pg_upgrade
Previous Message Jeff Davis 2018-06-21 20:04:21 Re: Spilling hashed SetOps and aggregates to disk