Re: Patch Review: Bugfix for XPATH() if text or attribute nodes are selected

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Radosław Smogura <rsmogura(at)softperience(dot)eu>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Patch Review: Bugfix for XPATH() if text or attribute nodes are selected
Date: 2011-07-12 12:25:12
Message-ID: AE24E2AB-1D94-41DF-AACF-597154A55B18@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jul12, 2011, at 12:57 , Radosław Smogura wrote:
> On Tue, 12 Jul 2011 11:45:59 +0200, Florian Pflug wrote:
>> On Jul12, 2011, at 11:00 , Radosław Smogura wrote:
>>> On Sun, 10 Jul 2011 17:06:22 -0500, Robert Haas wrote:
>>>> Unless I am missing something, Florian is clearly correct here.
>>> For me not, because this should be fixed internally by making xml type sefe
>>
>> Huh??. Making the xml type safe is *exactly* what I'm trying to do here...
>>
>>> currently xml type may be used to keep proper XMLs and any kind of data, as well.
>>
>> As I pointed out before, that simply isn't true. Try storing
>> non-well-formed data into an XML column (there *are* ways to do
>> that, i.e. there are bugs, one if which I'm trying to fix here!)
>> and then dump and (try to) reload your database. Ka-wooooom!

You again very conveniently ignored me here, and thus the *fact*
that XML *doesn't* allow arbitrary textual values to be stored.
If it did, there would not be a "Ka-wooooom!" here.

I beg you to actually try this out. Put the result of an XPATH()
expression that returns a literal '<' into a column of type XML,
and dump and reload.

>>> If I ask, by any means select xpath(/text(...))..... I want to get text.
>>
>> And I want '3' || '4' to return the integer 34. Though luck! The fact that
>> XPATH() is declared to return XML, *not* TEXT means you don't get what you
>> want. Period. Feel free to provide a patch that adds a function XPATH_TEXT
>> if you feel this is an issue.
>>
>> XML *is* *not* simply an alias for TEXT! It's a distinct type, which its
>> down distinct rules about what constitutes a valid value and what doesn't.

Again, you ignored my answer.

>>> 3) What about current applications, folks probably uses this and are happy
>>> they get text, and will not see, that next release of PostgreSQL will break
>>> their applications.
>>
>> That, and *only* that, I recognize as a valid concern. However, and *again*
>> as I have pointer out before a *multiple* of times, backwards compatibility
>> is no excuse not to fix bugs. Plus, there might just as well be applications
>> which feed the contents of XML columns directly into a XML parser (as they
>> have every right to!) and don't expect that parser to throw an error. Which,
>> as it stands, we cannot guarantee. Having to deal with an error there is akin
>> to having to deal with integer columns containing 'foobar'!
>
> Bugs must be resolved in smart way, especially if they changes behaviour, with
> consideration of impact change will produce, removing support for xml resolves
> this bug as well. I've said problem should be resolved in different way.

Fine. So what does that "different way" look like? Keeping things as they are
is certainly not an option, since it failed as soon as you dump and reload
(Or simply cast the value to TEXT and back to XML).

>>> There is of course disadvantage of current behaviour as it may lead to
>>> inserting badly xmls (in one case), but I created example when auto escaping
>>> will create double escaped xmls, and may lead to insert inproper data (this
>>> is about 2nd patch where Florian add escaping, too).
>>>
>>> SELECT XMLELEMENT(name root, XMLATTRIBUTES(foo.namespace AS sth)) FROM (SELECT
>>> (XPATH('namespace-uri(/*)', x))[1] AS namespace FROM (VALUES (XMLELEMENT(name
>>> "root", XMLATTRIBUTES('<n' AS xmlns, '<v' AS value),'<t'))) v(x)) as foo;
>>>
>>> xmlelement
>>> -------------------------
>>> <root sth="&amp;lt;n"/>
>>
>> Radosław, you've raised that point before, and I refuted it. The crucial
>> difference is that double-escaped values are well-formed, where as un-escaped
>> ones are not.
>>
>> Again, as I said before, the double-escaping done by XMLATTRIBUTES there is
>> not pretty. But its *not* XPATH()'s fault!. To see that, simply replace your
>> XPATH() expression with '&lt;n'::xml to see that.

And here too I see no response from you...

>> Frankly, Radosław, I get the feeling that you're not trying to understand my
>> answers to your objections, but instead keep repeating the same assertions
>> over and over again. Even though at least some of them, like XML
>> being able to
>> store arbitrary values, are simply wrong! And I'm getting pretty
>> tired of this...
>> So far, you also don't seem to have taken a single look at the actual
>> implementation of the patch, even though code review is an supposed to be an
>> integral part of the patch review process. I therefore don't believe
>> that we're
>> getting anywhere here.
> So far, you don't know if I taken a single look, your suspicious are wrong, and
> You try to blame me.

Well, you haven't commented on the code, so assumed that you haven't
look at it. May I instead assume that you did look at it, and found the
patch to be in good shape, implementation-wise?

> All of your sentences about "do not understanding" I may
> sent to you, and blame you with your words.

I think I have so far provided quite detailed responses to all of your
concerns. If no, please point me to one of your concerns where I haven't
either acknowledged that there is a problem, or have explained quite detailed
why there is none.

>> So please either start reviewing the actual implementation, and leave
>> the considerations about whether we want this or not to the eventual
>> committer.
>> Or, if you don't want to do that for one reason or another, pleaser consider
>> letting somebody else take over this review, i.e. consider removing your name
>> from the "Reviewer" field.
>
> If I do review I may put my comments, but "I get the feeling that you're not
> trying to understand my answers to your objections, but instead keep repeating
> the same assertions over and over again." - and in patch there is review of code.

I didn't see any code review. Maybe I missed it, though. Could you point me
to it again, please?

> So please either "stop" responding to my objections "and leave the considerations
> about whether we want this or not to the eventual committer."

It's usually the reviewer, not the patch author, who decides that it's time
for a committer to look at the patch. At that point, the reviewer then marks
the patch as "Ready for Committer". I can do that for you, of course, but
I didn't want to do it without your consent.

> And consider this new assertions...
> I store XML document in XML (escaped text, why not?), then recreate it by xpath
> and insert, with Your patch, I will not insert document but escaped text, which
> is wrong XML, and ALL your objections about current buggy not-escaping will FULLY
> apply to this situation.

As far as I know, this only happens if you put the result of XPATH() into
an XML *attribute*. If you instead put it into a text node (i.e, pass the
result of XPATH() to XMLELEMENT(), not XMLATTRIBUTES()), everything is fine.
In fact, *without* the patch, if you select a text node with XPATH() and
pass the result to XMLELEMENT(), the result is not well-formed.

Here's an example.

select xmlelement(
name "newtag",
(xpath('/oldtag/text()', '<oldtag>&lt;</oldtag>'))[1]
);

produces

xmlelement
--------------------
<newtag><</newtag>

That can't be right, can it?

Again, this currently just happens to work for the combination of
XMLATTRIBUTES and XPATH() because XPATH()'s failure to escape the
result is compensated by XMLATTRIBUTES()'s failure to not escape values
which are already of type XML.

> It's ping-pong, and this patch moves problem from one corner to other.
>
> For me this discussion is over. I putted my objections and suggestions. Full
> review is available in archives, and why to not escape is putted in review
> of your 2nd patch, about scalar values.

Please mark the two patches "Ready for Committer", then!

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shigeru Hanada 2011-07-12 12:25:36 Re: per-column generic option
Previous Message Robert Haas 2011-07-12 12:19:52 Re: per-column generic option