Re: BUG #16046: xpath returns CDATA tag along with the value in postgres 12

From: Chapman Flack <chap(at)anastigmatix(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, mostafa_bit0108(at)hotmail(dot)com
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org, Markus Winand <markus(dot)winand(at)winand(dot)at>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Subject: Re: BUG #16046: xpath returns CDATA tag along with the value in postgres 12
Date: 2019-10-24 23:14:48
Message-ID: 5DB23068.3080601@anastigmatix.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 10/24/19 17:38, Tom Lane wrote:
>> Query: SELECT unnest(xpath('//cname/aname/text()','<cname><aname><![CDATA[select 5]]></aname></cname>'::xml))
>>
>> Output - pg11: select 5
>>
>> Output - pg12: <![CDATA[select 5]]>
>
> ... What's not entirely clear to me is whether it's an intentional
> effect, or a bug. Authors, any comments?

Hmm. I would say the pg12 behavior is "not wrong". But it's unexpected.
xpath's return type is xml (well, array of), so the result must have a
form that can escape any characters mistakable for markup. In this example,
there aren't any, but once tweaked so there are:

SELECT
unnest(xpath('//cname/aname/text()',
'<cname><aname><![CDATA[select 5 & 6 <yahoo!>]]></aname></cname>'::xml));

pg12: <![CDATA[select 5 & 6 <yahoo!>]]>

the necessity is clear.

The other valid option would be to return, not CDATA, but a regular
text node, which would look like straight text if there were no special
characters in it, and would otherwise have every such character individually
escaped. That's what I get from pg11:

SELECT
unnest(xpath('//cname/aname/text()',
'<cname><aname><![CDATA[select 5 & 6 <yahoo!>]]></aname></cname>'::xml));

pg11: select 5 &amp; 6 &lt;yahoo!&gt;

So what pg11 is doing is also "not wrong" (in this respect, anyway).
And looks "more natural", in the case where the value has no characters
that need escaping.

Which may or may not be a good thing. Perhaps it could lead the unwary
in some cases to think such a query is giving a directly usable
text string back, which will be harmless until the one time a value
with escaping comes back. (The no-surprises way to get back a directly
usable text string, if that's what's wanted, would be with XMLTABLE
and an output column of text type.)

Oddly, what pg12 is doing seems to be influenced by the form of escaping
used in the input:

SELECT
unnest(xpath('//cname/aname/text()',
'<cname><aname><![CDATA[select 5 & 6 <yahoo!>]]></aname></cname>'::xml));
unnest
-----------------------------------
<![CDATA[select 5 & 6 <yahoo!>]]>

SELECT
unnest(xpath('//cname/aname/text()',
'<cname><aname>select 5 &amp; 6 &lt;yahoo!&gt;</aname></cname>'::xml));
unnest
---------------------------------
select 5 &amp; 6 &lt;yahoo!&gt;

Either form of result is correct, and having it respect the form that was
used in the input might even be delightfully smart.

I haven't looked in the code just now to see if it is intentionally being
delightfully smart, or more simplistic-and-lucky.

Regards,
-Chap

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Chapman Flack 2019-10-24 23:34:09 Re: BUG #16046: xpath returns CDATA tag along with the value in postgres 12
Previous Message Tom Lane 2019-10-24 21:38:11 Re: BUG #16046: xpath returns CDATA tag along with the value in postgres 12