Skip site navigation (1) Skip section navigation (2)

Re: Review of patch Bugfix for XPATH() if expression returns a scalar value

From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review of patch Bugfix for XPATH() if expression returns a scalar value
Date: 2011-06-30 08:33:57
Message-ID: fc947491f738a2c6c2232ec1eecfa51a@mail.softperience.eu (view raw or flat)
Thread:
Lists: pgsql-hackers
 On Wed, 29 Jun 2011 22:26:39 +0200, Florian Pflug wrote:
> On Jun29, 2011, at 19:57 , Radosław Smogura wrote:
>> This is review of patch
>> https://commitfest.postgresql.org/action/patch_view?id=565
>> "Bugfix for XPATH() if expression returns a scalar value"
>>
>> SELECT XMLELEMENT(name root, XMLATTRIBUTES(foo.namespace AS sth)) 
>> FROM (SELECT
>> (XPATH('namespace-uri(/*)', x))[1] AS namespace FROM (VALUES 
>> (XMLELEMENT(name
>> "root", XMLATTRIBUTES('<n' AS xmlns, '<v' AS value),'<t'))) v(x)) as 
>> foo;
>>
>>       xmlelement
>> -------------------------
>> <root sth="&amp;lt;n"/>
>>
>> 	It's clearly visible that value from attribute is "&lt;n", not "<". 
>> Every
>> parser will read this as "&lt;n" which is not-natural and will 
>> require form
>> consumer/developer to de-escape this on his side - roughly speaking 
>> this will
>> be reported as serious bug.
>
> There's a problem there, no doubt, but your proposed solution just 
> moves the
> problem around.
>
> Here's your query, reformatted to be more legible
>
> SELECT
>   XMLELEMENT(name root,
>              XMLATTRIBUTES(foo.namespace AS sth)
>   )
> FROM (
>   SELECT
>     (XPATH('namespace-uri(/*)', x))[1] AS namespace
>   FROM (VALUES (XMLELEMENT(name "root",
>                 XMLATTRIBUTES('<n' AS xmlns,
>                               '<v' AS value),'<t')
>   )
> ) v(x)) as foo;
>
> What happens is that the XPATH expression selects the xmlns
> attribute with the value '<n'. To be well-formed xml, that value
> must be escaped, so what is actually returned by the XPATH
> call is '&lt;n'. The XMLATTRIBUTES() function, however, doesn't
> distinguish between input of type XML and input of type TEXT,
> so it figures it has to represent the *textual* value '&lt;n'
> in xml, and produces '&amp;lt;n'.
>
> Not escaping textual values returned by XPATH isn't a solution,
> though. For example, assume someone inserts the value returned
> by the XPATH() call in your query into a column of type XML.
> If XPATH() returned '<n' literally instead of escaping it,
> the column would contain non-well-formed XML in a column of type
> XML. Not pretty, and pretty fatal during your next dump/reload
> cycle.
>
> Or, if you want another example, simply remove the XMLATTRIBUTES
> call and use the value returned by XPATH as a child node directly.
>
> SELECT
>   XMLELEMENT(name root,
>              foo.namespace
>   )
> FROM (
>   SELECT
>     (XPATH('namespace-uri(/*)', x))[1] AS namespace
>   FROM (VALUES (XMLELEMENT(name "root",
>                 XMLATTRIBUTES('<n' AS xmlns,
>                               '<v' AS value),'<t')
>   )
> ) v(x)) as foo;
>
>      xmlelement
> --------------------
>  <root>&lt;n</root>
>
> Note that this correct! The <root> node contains a text node
> representing the textual value '<n'. If XPATH() hadn't return
> the value escaped, the query above would have produces
>
>   <root><n</root>
>
> which is obviously wrong. It works in this case because XMLELEMENT
> is smart enough to distinguish between child nodes gives as TEXT
> values (those are escaped) and child nodes provided as XML values
> (those are inserted unchanged).
>
> XMLATTRIBUTES, however, doesn't make that distinction, and simply
> escaped all attributes values independent of their type. Now, the
> reason it does that is probably that *not* all XML values are
> well-formed attribute values without additional escaping. For 
> example,
> you cannot have literal '<' and '>' in attribute values. So if
> XMLATTRIBUTES, like XMLELEMENT never escaped values which are
> already of type XML, the following piece of code
>
>   XMLELEMENT(name "a", XMLATTRIBUTES('<node/>'::xml as v))
>
> would produce
>
>   <a v="<node/>"/>
>
> which, alas, is not well-formed :-(
>
> The correct solution, I believe, is to teach XMLATTRIBUTES to
> only escape those characters which are invalid in attribute
> values but valid in XML (Probably only '<' and '>' but I'll
> check). If there are no objects, I'll prepare a separate patch
> for that. I don't want to include it in this patch because it's
> really a distinct issue (even modifies a different function).
>
> best regards,
> Florian Pflug

 You may manually enter invalid xml too, You don't need xpath for this. 
 Much more
 In PostgreSQL 9.0.1, compiled by Visual C++ build 1500, 64-bit
 I executed
 SELECT XMLELEMENT(name "a", XMLATTRIBUTES('<node/>'::xml as v))
 and it's looks like I got correct output "<a v="&lt;node/&gt;"/>"
 when I looked with text editor into table files I saw same value.

 I will check on last master if I can reproduce this.

 But indeed, now PostgreSQL is not type safe against XML type. See 
 SELECT XMLELEMENT(name "root", '<', (xpath('//text()', 
 '<root>&lt;</root>'))[1])).

 You found some problem with escaping, but solution is bad, I think 
 problem lies with XML type, which may be used to hold strings and proper 
 xml. For example above query can't distinguish if (xpath('//text()', 
 '<root>&lt;</root>'))[1] is xml text, or xml element.

 For me adding support for scalar is really good and needed, but 
 escaping is not.

 Regards,
 Radosław Smogura


In response to

Responses

pgsql-hackers by date

Next:From: Hitoshi HaradaDate: 2011-06-30 08:37:43
Subject: Re: Parameterized aggregate subquery (was: Pull up aggregate subquery)
Previous:From: Yeb HavingaDate: 2011-06-30 08:03:22
Subject: Re: Parameterized aggregate subquery

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group