Re: Review of patch Bugfix for XPATH() if expression returns a scalar value

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Radosław Smogura <rsmogura(at)softperience(dot)eu>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review of patch Bugfix for XPATH() if expression returns a scalar value
Date: 2011-06-29 20:26:39
Message-ID: 6A2C047F-7EE4-430D-9813-A1B215D81D56@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jun29, 2011, at 19:57 , Radosław Smogura wrote:
> This is review of patch
> https://commitfest.postgresql.org/action/patch_view?id=565
> "Bugfix for XPATH() if expression returns a scalar value"
>
> SELECT XMLELEMENT(name root, XMLATTRIBUTES(foo.namespace AS sth)) FROM (SELECT
> (XPATH('namespace-uri(/*)', x))[1] AS namespace FROM (VALUES (XMLELEMENT(name
> "root", XMLATTRIBUTES('<n' AS xmlns, '<v' AS value),'<t'))) v(x)) as foo;
>
> xmlelement
> -------------------------
> <root sth="&amp;lt;n"/>
>
> It's clearly visible that value from attribute is "&lt;n", not "<". Every
> parser will read this as "&lt;n" which is not-natural and will require form
> consumer/developer to de-escape this on his side - roughly speaking this will
> be reported as serious bug.

There's a problem there, no doubt, but your proposed solution just moves the
problem around.

Here's your query, reformatted to be more legible

SELECT
XMLELEMENT(name root,
XMLATTRIBUTES(foo.namespace AS sth)
)
FROM (
SELECT
(XPATH('namespace-uri(/*)', x))[1] AS namespace
FROM (VALUES (XMLELEMENT(name "root",
XMLATTRIBUTES('<n' AS xmlns,
'<v' AS value),'<t')
)
) v(x)) as foo;

What happens is that the XPATH expression selects the xmlns
attribute with the value '<n'. To be well-formed xml, that value
must be escaped, so what is actually returned by the XPATH
call is '&lt;n'. The XMLATTRIBUTES() function, however, doesn't
distinguish between input of type XML and input of type TEXT,
so it figures it has to represent the *textual* value '&lt;n'
in xml, and produces '&amp;lt;n'.

Not escaping textual values returned by XPATH isn't a solution,
though. For example, assume someone inserts the value returned
by the XPATH() call in your query into a column of type XML.
If XPATH() returned '<n' literally instead of escaping it,
the column would contain non-well-formed XML in a column of type
XML. Not pretty, and pretty fatal during your next dump/reload
cycle.

Or, if you want another example, simply remove the XMLATTRIBUTES
call and use the value returned by XPATH as a child node directly.

SELECT
XMLELEMENT(name root,
foo.namespace
)
FROM (
SELECT
(XPATH('namespace-uri(/*)', x))[1] AS namespace
FROM (VALUES (XMLELEMENT(name "root",
XMLATTRIBUTES('<n' AS xmlns,
'<v' AS value),'<t')
)
) v(x)) as foo;

xmlelement
--------------------
<root>&lt;n</root>

Note that this correct! The <root> node contains a text node
representing the textual value '<n'. If XPATH() hadn't return
the value escaped, the query above would have produces

<root><n</root>

which is obviously wrong. It works in this case because XMLELEMENT
is smart enough to distinguish between child nodes gives as TEXT
values (those are escaped) and child nodes provided as XML values
(those are inserted unchanged).

XMLATTRIBUTES, however, doesn't make that distinction, and simply
escaped all attributes values independent of their type. Now, the
reason it does that is probably that *not* all XML values are
well-formed attribute values without additional escaping. For example,
you cannot have literal '<' and '>' in attribute values. So if
XMLATTRIBUTES, like XMLELEMENT never escaped values which are
already of type XML, the following piece of code

XMLELEMENT(name "a", XMLATTRIBUTES('<node/>'::xml as v))

would produce

<a v="<node/>"/>

which, alas, is not well-formed :-(

The correct solution, I believe, is to teach XMLATTRIBUTES to
only escape those characters which are invalid in attribute
values but valid in XML (Probably only '<' and '>' but I'll
check). If there are no objects, I'll prepare a separate patch
for that. I don't want to include it in this patch because it's
really a distinct issue (even modifies a different function).

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Florian Pflug 2011-06-29 20:48:40 Re: Patch Review: Bugfix for XPATH() if text or attribute nodes are selected
Previous Message Kevin Grittner 2011-06-29 20:22:01 serializable installcheck