Re: XMLATTRIBUTES vs. values of type XML

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: XMLATTRIBUTES vs. values of type XML
Date: 2011-08-11 09:12:51
Message-ID: 22F5933F-3D17-4556-8920-0AFD074859EB@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Aug11, 2011, at 09:16 , Peter Eisentraut wrote:
> On fre, 2011-07-29 at 11:37 +0200, Florian Pflug wrote:
>> On Jul28, 2011, at 22:51 , Peter Eisentraut wrote:
>>> On ons, 2011-07-27 at 23:21 +0200, Florian Pflug wrote:
>>>> On Jul27, 2011, at 23:08 , Peter Eisentraut wrote:
>>>>> Well, offhand I would expect that passing an XML value to XMLATTRIBUTES
>>>>> would behave as in
>>>>>
>>>>> SELECT XMLELEMENT(NAME "t", XMLATTRIBUTES(XMLSERIALIZE(content '&amp;'::XML AS text) AS "a"))
>>>>
>>>> With both 9.1 and 9.2 this query returns
>>>>
>>>> xmlelement
>>>> --------------------
>>>> <t a="&amp;amp;"/>
>>>>
>>>> i.e. makes the value of "a" represent the *literal* string '&amp;', *not*
>>>> the literal string '&'. Just to be sure there's no miss-understanding here
>>>> - is this what you expect?
>>>
>>> Well, I expect it to fail.
>>
>> Now you've lost me. What exactly should fail under what circumstances?
>
> To me, the best solution still appears to be forbidding passing values
> of type xml to XMLATTRIBUTES, unless we find an obviously better
> solution that is not, "I came up with this custom escape function that I
> tweaked so that it appears to make sense".

Hm, OK, I see your point. However, if we simply raise an error in 9.2,
and do nothing else, that we make it impossible to use the result of
an XPath expression as an XML attribute value. Not just inconvenient,
but impossible, so I don't think we can do that. We'd thus need to add a
function

XMLUNESCAPE(XML) RETURNS TEXT

to restore that functionality. Defining a sane behaviour for such a function,
however, seems no easier than defining sane behaviour for an XML attribute
of already of type XML. The core of the problems remains to define the result
of XMLUNESCAPE('<tag>content</tag>'), just as the core of the XMLATTRIBUTES
problems is to define XMLELEMENT(... XMLATTRIBUTES('<tag>content</tag>' as a)).

Thinking about this further, it seems that we essentially have two distinct
classes of XML values. Some are essentially plain text, but might contains
entity references, while others are "real" XML fragments which contain at
least one tag. That suggests that a sensible behaviour for XMLUNESCAPE might
be to return a string with the entity references resolved in the former case,
and simply return an error in the latter.

To summarize, we'd have

XMLUNESCAPE(''::XML) -> 'a'
XMLUNESCAPE('a'::XML) -> 'a'
XMLUNESCAPE('&lt;'::XML) -> '<'
XMLUNESCAPE('<t/>'::XML) -> error

To not break applications needlessly, I'd then be inclined to make

XMLATTRIBUTES(xml_value as "a")

mean

XMLATTRIBUTES(XMLUNESCAPE(xml_value) as "a")

i.e. throw an error if xml_value contains anything but plain text and
entity references. But I could probably also live with not doing that.

>>> Unfortunately, in the latest SQL/XML standard the final
>>> answer it nested deep in the three other standards, so I don't have an
>>> answer right now. But there are plenty of standards in this area, so
>>> I'd hope that one of them can give us the right behavior, instead of us
>>> making something up.
>>
>> Which standards to you have in mind there? If you can point me to a place
>> where I can obtain them, I could check if there's something in them
>> which helps.
>
> In SQL/XML 2008, the actual behavior of XMLSERIALIZE is delegated to
> "XSLT 2.0 and XQuery 1.0 Serialization". I'm not familiar with this
> latter standard, but it appears to have lots of options and parameters,
> one of which might help us.

I'll try to obtain a copy of that. Thanks.

best regards,
Florian Pflug

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bernd Helmle 2011-08-11 09:49:12 Re: "pgstat wait timeout" warnings
Previous Message Magnus Hagander 2011-08-11 08:08:31 Re: sha1, sha2 functions into core?