Re: Patch Review: Bugfix for XPATH() if text or attribute nodes are selected

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Radosław Smogura <rsmogura(at)softperience(dot)eu>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Patch Review: Bugfix for XPATH() if text or attribute nodes are selected
Date: 2011-07-12 09:45:59
Message-ID: 0A7466C6-B139-4B9C-96E7-1315DA03AD7F@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jul12, 2011, at 11:00 , Radosław Smogura wrote:
> On Sun, 10 Jul 2011 17:06:22 -0500, Robert Haas wrote:
>> Unless I am missing something, Florian is clearly correct here.
> For me not, because this should be fixed internally by making xml type sefe

Huh??. Making the xml type safe is *exactly* what I'm trying to do here...

> currently xml type may be used to keep proper XMLs and any kind of data, as well.

As I pointed out before, that simply isn't true. Try storing
non-well-formed data into an XML column (there *are* ways to do
that, i.e. there are bugs, one if which I'm trying to fix here!)
and then dump and (try to) reload your database. Ka-wooooom!

> If I ask, by any means select xpath(/text(...))..... I want to get text.

And I want '3' || '4' to return the integer 34. Though luck! The fact that
XPATH() is declared to return XML, *not* TEXT means you don't get what you
want. Period. Feel free to provide a patch that adds a function XPATH_TEXT
if you feel this is an issue.

XML *is* *not* simply an alias for TEXT! It's a distinct type, which its
down distinct rules about what constitutes a valid value and what doesn't.

> 1) How I should descape node in client application (if it's part of xml I don't have header), bear in mind XML must give support for streaming processing too.

Huh?

> 2) Why I should differntly treat text() then select from varchar in both I ask for xml, driver can't make this, because it doesn't know if it gets scalar, text, comment, element, or maybe document.

> 3) What about current applications, folks probably uses this and are happy they get text, and will not see, that next release of PostgreSQL will break their applications.

That, and *only* that, I recognize as a valid concern. However, and *again*
as I have pointer out before a *multiple* of times, backwards compatibility
is no excuse not to fix bugs. Plus, there might just as well be applications
which feed the contents of XML columns directly into a XML parser (as they
have every right to!) and don't expect that parser to throw an error. Which,
as it stands, we cannot guarantee. Having to deal with an error there is akin
to having to deal with integer columns containing 'foobar'!

> There is of course disadvantage of current behaviour as it may lead to inserting badly xmls (in one case), but I created example when auto escaping will create double escaped xmls, and may lead to insert inproper data (this is about 2nd patch where Florian add escaping, too).
>
> SELECT XMLELEMENT(name root, XMLATTRIBUTES(foo.namespace AS sth)) FROM (SELECT
> (XPATH('namespace-uri(/*)', x))[1] AS namespace FROM (VALUES (XMLELEMENT(name
> "root", XMLATTRIBUTES('<n' AS xmlns, '<v' AS value),'<t'))) v(x)) as foo;
>
> xmlelement
> -------------------------
> <root sth="&amp;lt;n"/>

Radosław, you've raised that point before, and I refuted it. The crucial
difference is that double-escaped values are well-formed, where as un-escaped
ones are not.

Again, as I said before, the double-escaping done by XMLATTRIBUTES there is
not pretty. But its *not* XPATH()'s fault!. To see that, simply replace your
XPATH() expression with '&lt;n'::xml to see that.

And in fact

> It can't be resolved without storing type in xml or adding xmltext or adding pseudo xmlany element, which will be returned by xpath.

Huh?

Frankly, Radosław, I get the feeling that you're not trying to understand my
answers to your objections, but instead keep repeating the same assertions
over and over again. Even though at least some of them, like XML being able to
store arbitrary values, are simply wrong! And I'm getting pretty tired of this...
So far, you also don't seem to have taken a single look at the actual
implementation of the patch, even though code review is an supposed to be an
integral part of the patch review process. I therefore don't believe that we're
getting anywhere here.

So please either start reviewing the actual implementation, and leave
the considerations about whether we want this or not to the eventual committer.
Or, if you don't want to do that for one reason or another, pleaser consider
letting somebody else take over this review, i.e. consider removing your name
from the "Reviewer" field.

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2011-07-12 09:46:09 Re: dropping table in testcase alter_table.sql
Previous Message Radosław Smogura 2011-07-12 09:00:29 Re: Patch Review: Bugfix for XPATH() if text or attribute nodes are selected