Re: Patch Review: Bugfix for XPATH() if text or attribute nodes are selected

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Radosław Smogura <rsmogura(at)softperience(dot)eu>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch Review: Bugfix for XPATH() if text or attribute nodes are selected
Date: 2011-06-29 20:48:40
Message-ID: 1471EE3B-B08E-418B-8D9F-54116812F6D5@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jun29, 2011, at 19:34 , Radosław Smogura wrote:
> B. 6. Current behaviour _is intended_ (there is "if" to check node type) and _"natural"_. In this particular case user ask for text content of some node, and this content is actually "<".

I don't buy that. The check for the node type is there because
two different libxml functions are used to convert nodes to
strings. The if has absolutely *zero* to do with escaping, expect
for that missing escape_xml() call in the "else" case.

Secondly, there is little point in having an type XML if we
don't actually ensure that values of that type can only contain
well-formed XML.

> B. 7. Similar behaviour may be observer e. g. in SQL Server(tm)
> SELECT x.value('(//text())[1]', 'varchar(256)') FROM #xml_t;
> Produced "<"

Whats the *type* of that value? I'm not familiar with the XPATH
support in SQL Server, but since you pass 'varchar(256)' as
the second parameter, I assume that this is how you tell it
the return type you want. Since you chose a textual type, not
quoting the value is perfectly fine. I suggest that you try
this again, but this time evaluate the XPATH expression so
that you get a value of type XML (Assuming such a type exists
in SQL Server)

> B. 8. Even if current behaviour may be treated as wrong it was exposed and other may depends on not escaped content.

Granted, there's the possibility of breaking existing applications
here. But the same argument could have been applied when we made
check against invalid characters (e.g. invalid UTF-8 byte sequences)
tighter for textual types.

When it comes to data integrity (and non-well-formed values in
XML columns are a data integrity issue), I believe that the advantages
of tighter checks out-weight potential compatibility problems.

> B. 9. I think, correct approach should go to create new function (basing on one existing) that will be able to escape above. In this situation
> call should look like (for example only!):
> SELECT xml_escape((XPATH('/*/text()', '<root>&lt;</root>')))[1]
> or
> SELECT xml_escape((XPATH('/*/text()', '<root>&lt;</root>'))[1])
> One method to operate on array one to operate on single XML datum.
> Or to think about something like xmltext(). (Compare current status of xml.c)

-1. Again, value of type XML should always be well-formed XML.
Requiring every future user of posgres XML support to remember
to surround XPATH() calls with XML_ESCAPE() will undoubtedly lead
to bugs.

I'm all for having escaping and un-escaping functions though,
but these *need* to have the signatures

XML_ESCAPE(text) RETURNS xml
XML_UNESCAPE(XML) RETURNS text

best regards,
Florian Pflug

PS: Next time, please post your Review as a follow-up of the mail
that contains the patch. That makes sure that people who already
weighted in on the issue don't overlook your review. Or, the
patch author, for that matter - I nearly missed your Review between
the larger number of mail in my postgres folder.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2011-06-29 20:49:15 Re: default privileges wording
Previous Message Florian Pflug 2011-06-29 20:26:39 Re: Review of patch Bugfix for XPATH() if expression returns a scalar value