Re: Review of patch Bugfix for XPATH() if expression returns a scalar value

From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review of patch Bugfix for XPATH() if expression returns a scalar value
Date: 2011-06-30 08:33:57
Message-ID: fc947491f738a2c6c2232ec1eecfa51a@mail.softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 29 Jun 2011 22:26:39 +0200, Florian Pflug wrote:
> On Jun29, 2011, at 19:57 , Radosław Smogura wrote:
>> This is review of patch
>> https://commitfest.postgresql.org/action/patch_view?id=565
>> "Bugfix for XPATH() if expression returns a scalar value"
>>
>> SELECT XMLELEMENT(name root, XMLATTRIBUTES(foo.namespace AS sth))
>> FROM (SELECT
>> (XPATH('namespace-uri(/*)', x))[1] AS namespace FROM (VALUES
>> (XMLELEMENT(name
>> "root", XMLATTRIBUTES('<n' AS xmlns, '<v' AS value),'<t'))) v(x)) as
>> foo;
>>
>> xmlelement
>> -------------------------
>> <root sth="&amp;lt;n"/>
>>
>> It's clearly visible that value from attribute is "&lt;n", not "<".
>> Every
>> parser will read this as "&lt;n" which is not-natural and will
>> require form
>> consumer/developer to de-escape this on his side - roughly speaking
>> this will
>> be reported as serious bug.
>
> There's a problem there, no doubt, but your proposed solution just
> moves the
> problem around.
>
> Here's your query, reformatted to be more legible
>
> SELECT
> XMLELEMENT(name root,
> XMLATTRIBUTES(foo.namespace AS sth)
> )
> FROM (
> SELECT
> (XPATH('namespace-uri(/*)', x))[1] AS namespace
> FROM (VALUES (XMLELEMENT(name "root",
> XMLATTRIBUTES('<n' AS xmlns,
> '<v' AS value),'<t')
> )
> ) v(x)) as foo;
>
> What happens is that the XPATH expression selects the xmlns
> attribute with the value '<n'. To be well-formed xml, that value
> must be escaped, so what is actually returned by the XPATH
> call is '&lt;n'. The XMLATTRIBUTES() function, however, doesn't
> distinguish between input of type XML and input of type TEXT,
> so it figures it has to represent the *textual* value '&lt;n'
> in xml, and produces '&amp;lt;n'.
>
> Not escaping textual values returned by XPATH isn't a solution,
> though. For example, assume someone inserts the value returned
> by the XPATH() call in your query into a column of type XML.
> If XPATH() returned '<n' literally instead of escaping it,
> the column would contain non-well-formed XML in a column of type
> XML. Not pretty, and pretty fatal during your next dump/reload
> cycle.
>
> Or, if you want another example, simply remove the XMLATTRIBUTES
> call and use the value returned by XPATH as a child node directly.
>
> SELECT
> XMLELEMENT(name root,
> foo.namespace
> )
> FROM (
> SELECT
> (XPATH('namespace-uri(/*)', x))[1] AS namespace
> FROM (VALUES (XMLELEMENT(name "root",
> XMLATTRIBUTES('<n' AS xmlns,
> '<v' AS value),'<t')
> )
> ) v(x)) as foo;
>
> xmlelement
> --------------------
> <root>&lt;n</root>
>
> Note that this correct! The <root> node contains a text node
> representing the textual value '<n'. If XPATH() hadn't return
> the value escaped, the query above would have produces
>
> <root><n</root>
>
> which is obviously wrong. It works in this case because XMLELEMENT
> is smart enough to distinguish between child nodes gives as TEXT
> values (those are escaped) and child nodes provided as XML values
> (those are inserted unchanged).
>
> XMLATTRIBUTES, however, doesn't make that distinction, and simply
> escaped all attributes values independent of their type. Now, the
> reason it does that is probably that *not* all XML values are
> well-formed attribute values without additional escaping. For
> example,
> you cannot have literal '<' and '>' in attribute values. So if
> XMLATTRIBUTES, like XMLELEMENT never escaped values which are
> already of type XML, the following piece of code
>
> XMLELEMENT(name "a", XMLATTRIBUTES('<node/>'::xml as v))
>
> would produce
>
> <a v="<node/>"/>
>
> which, alas, is not well-formed :-(
>
> The correct solution, I believe, is to teach XMLATTRIBUTES to
> only escape those characters which are invalid in attribute
> values but valid in XML (Probably only '<' and '>' but I'll
> check). If there are no objects, I'll prepare a separate patch
> for that. I don't want to include it in this patch because it's
> really a distinct issue (even modifies a different function).
>
> best regards,
> Florian Pflug

You may manually enter invalid xml too, You don't need xpath for this.
Much more
In PostgreSQL 9.0.1, compiled by Visual C++ build 1500, 64-bit
I executed
SELECT XMLELEMENT(name "a", XMLATTRIBUTES('<node/>'::xml as v))
and it's looks like I got correct output "<a v="&lt;node/&gt;"/>"
when I looked with text editor into table files I saw same value.

I will check on last master if I can reproduce this.

But indeed, now PostgreSQL is not type safe against XML type. See
SELECT XMLELEMENT(name "root", '<', (xpath('//text()',
'<root>&lt;</root>'))[1])).

You found some problem with escaping, but solution is bad, I think
problem lies with XML type, which may be used to hold strings and proper
xml. For example above query can't distinguish if (xpath('//text()',
'<root>&lt;</root>'))[1] is xml text, or xml element.

For me adding support for scalar is really good and needed, but
escaping is not.

Regards,
Radosław Smogura

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hitoshi Harada 2011-06-30 08:37:43 Re: Parameterized aggregate subquery (was: Pull up aggregate subquery)
Previous Message Yeb Havinga 2011-06-30 08:03:22 Re: Parameterized aggregate subquery