Patch Review: Bugfix for XPATH() if text or attribute nodes are selected

From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Patch Review: Bugfix for XPATH() if text or attribute nodes are selected
Date: 2011-06-29 17:34:23
Message-ID: 201106291934.23089.rsmogura@softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Review of patch
https://commitfest.postgresql.org/action/patch_view?id=580

=== Patch description ===
SUMMARY: When text() based XPATH expression is invoked output is not XML
escaped

DESCRIPTION: Submitter invokes following statement:
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1].
He expect (escaped) result "&lt;", but gets "<"

AFFECTS: Possible this may affects situations when user wants to insert output
from above expression to XML column.

PATCH CONTENT:
A. 1. Patch fixes above problem (I don't treat this like problem, but like
enhancement).
A. 2. In addition patch contains test cases for above.
A. 3. Patch changes behaviour of method xml_xmlnodetoxmltype invoked by
xpath_internal, by adding escape_xml() call.
A. 4. I don't see any stability issues with this.
A. 5. Performance may be reduced and memory consumption may
increase due to internals of method escape_xml

=== Review ===
B. 1. Patch applies cleanly.

B. 2. Source compiles, and patch works as Submitter wants.

B. 3. Personally I missed some notes in documentation that such
expression will be escaped (those should be clearly exposed, as the "live"
functionality is changed).

B. 4. Submitter, possible, revealed some missed, minor functionality of
PostgreSQL XML. As he expects XML escaped output.

B. 5. Currently XPATH produces escaped output for Element nodes, and
not escaped output for all other types of nodes (including text,
comment, etc.)

B. 6. Current behaviour _is intended_ (there is "if" to check node type)
and _"natural"_. In this particular case user ask for text content of some
node, and this content is actually "<".

B. 7. Similar behaviour may be observer e. g. in SQL Server(tm)
SELECT x.value('(//text())[1]', 'varchar(256)') FROM #xml_t;
Produced "<"

B. 8. Even if current behaviour may be treated as wrong it was exposed
and other may depends on not escaped content.

B. 9. I think, correct approach should go to create new function
(basing on one existing) that will be able to escape above. In this
situation
call should look like (for example only!):
SELECT xml_escape((XPATH('/*/text()', '<root>&lt;</root>')))[1]
or
SELECT xml_escape((XPATH('/*/text()', '<root>&lt;</root>'))[1])
One method to operate on array one to operate on single XML datum.
Or to think about something like xmltext(). (Compare current status of xml.c)

B. 10. If such function (B.9.) is needed and if it will be included is out of
scope of this review.

Basing mainly on A.1, B.6., B.8., bearing in mind B.10., in my opinion this is
subject to reject as "need more work", "or as invalid".

The detailed explanation why such behaviour should not be implemented I will
send in review of https://commitfest.postgresql.org/action/patch_view?id=565.

Regards,

Radosław Smogura

P. S.
I would like to say sorry, for such late answaer, but I sent this from other
mail address, which was not attached to mailing list. Blame KDE KMail not me
:)

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-06-29 17:42:34 Re: default privileges wording
Previous Message Hitoshi Harada 2011-06-29 17:22:07 Re: Parameterized aggregate subquery (was: Pull up aggregate subquery)