Re: BUG #8469: Xpath behaviour unintuitive / arguably wrong

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: dennis(dot)noordsij(at)helsinki(dot)fi
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #8469: Xpath behaviour unintuitive / arguably wrong
Date: 2013-10-02 16:19:46
Message-ID: 20131002161946.GB5960@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, Sep 24, 2013 at 06:43:19PM +0000, dennis(dot)noordsij(at)helsinki(dot)fi wrote:
> The following bug has been logged on the website:
>
> Bug reference: 8469
> Logged by: Dennis
> Email address: dennis(dot)noordsij(at)helsinki(dot)fi
> PostgreSQL version: 9.3.0
> Operating system: FreeBSD 9.2-RC4
> Description:
>
> Hi,
>
>
> After upgrading an 8.1 version to 9.3.0 I am suddenly seeing text fields
> containing "&amp;" where they are populated from XML. This may be a
> coincidence and the problem may have existed earlier, in any case, now I
> noticed.
>
>
> I extract the text content of XML nodes using xpath, from something like:
>
>
> <name>Jones &amp; Smith</name>
>
>
> The reason I end up with "&amp;" is the IMHO rather odd xpath behaviour:
>
>
> # select xpath('/a/text()', (select xmlelement(name "a", 'A & B')));
>
>
> xpath
> ---------------
> {"A &amp; B"}
>
>
> The canonical contents of "a" is "A & B". At first search I've found some
> rather heated debates about this with bits of name calling; I certainly do
> not want to get into that and I apologize in advance to those who feel very
> strongly about this.
>
>
> I've seen one "fix" describe the problem as:
>
>
> ""DESCRIPTION: Submitter invokes following statement:
> SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1].
> He expect (escaped) result "&lt;", but gets "<"
> """
>
>
> With respect, this "bug" makes no sense as this produces in fact the right
> result. The actual value of <root> is "<", it's just escaped when serialized
> to XML. If <root> were to actually contain "&lt;", it'd be serialized as
> "&amp;lt;". It should not be possible to be blindly cast to a text type, but
> explicitly serialized as such.
>
>
> At least the reviewer at:
>
>
> http://www.postgresql.org/message-id/201106291934.23089.rsmogura@softperience.eu

There are two other similar bug reports on this from February and March
of this year:

http://www.postgresql.org/message-id/E1U1FKL-0002rD-RO@wrihigleys.postgresql.org
http://www.postgresql.org/message-id/E1UHyUw-0001oj-HE@wrigleys.postgresql.org

Someone who knows XML needs to take leadership on this and propose a
patch.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Bruce Momjian 2013-10-02 17:06:28 Re: pg_upgrade 9.0->9.2 failure: Mismatch of relation OID in database
Previous Message Bruce Momjian 2013-10-02 16:00:44 Re: BUG #8467: Slightly confusing pgcrypto example in docs