Re: BUG #8469: Xpath behaviour unintuitive / arguably wrong

From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Bruce Momjian <bruce(at)momjian(dot)us>, dennis(dot)noordsij(at)helsinki(dot)fi
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #8469: Xpath behaviour unintuitive / arguably wrong
Date: 2013-10-04 20:20:46
Message-ID: 524F231E.5010001@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 10/02/2013 06:19 PM, Bruce Momjian wrote:
> On Tue, Sep 24, 2013 at 06:43:19PM +0000, dennis(dot)noordsij(at)helsinki(dot)fi wrote:
>> The following bug has been logged on the website:
>>
>> Bug reference: 8469
>> Logged by: Dennis
>> Email address: dennis(dot)noordsij(at)helsinki(dot)fi
>> PostgreSQL version: 9.3.0
>> Operating system: FreeBSD 9.2-RC4
>> Description:
>>
>> Hi,
>>
>>
>> After upgrading an 8.1 version to 9.3.0 I am suddenly seeing text fields
>> containing "&amp;" where they are populated from XML. This may be a
>> coincidence and the problem may have existed earlier, in any case, now I
>> noticed.
>>
>>
>> I extract the text content of XML nodes using xpath, from something like:
>>
>>
>> <name>Jones &amp; Smith</name>
>>
>>
>> The reason I end up with "&amp;" is the IMHO rather odd xpath behaviour:
>>
>>
>> # select xpath('/a/text()', (select xmlelement(name "a", 'A & B')));
>>
>>
>> xpath
>> ---------------
>> {"A &amp; B"}
>>
>>
>> The canonical contents of "a" is "A & B". At first search I've found some
>> rather heated debates about this with bits of name calling; I certainly do
>> not want to get into that and I apologize in advance to those who feel very
>> strongly about this.
>>
>>
>> I've seen one "fix" describe the problem as:
>>
>>
>> ""DESCRIPTION: Submitter invokes following statement:
>> SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1].
>> He expect (escaped) result "&lt;", but gets "<"
>> """
>>
>>
>> With respect, this "bug" makes no sense as this produces in fact the right
>> result. The actual value of <root> is "<", it's just escaped when serialized
>> to XML. If <root> were to actually contain "&lt;", it'd be serialized as
>> "&amp;lt;". It should not be possible to be blindly cast to a text type, but
>> explicitly serialized as such.
>>
>>
>> At least the reviewer at:
>>
>>
>> http://www.postgresql.org/message-id/201106291934.23089.rsmogura@softperience.eu
>
> There are two other similar bug reports on this from February and March
> of this year:
>
> http://www.postgresql.org/message-id/E1U1FKL-0002rD-RO@wrihigleys.postgresql.org

I think that should be:
http://www.postgresql.org/message-id/E1U1FKL-0002rD-RO@wrigleys.postgresql.org

> http://www.postgresql.org/message-id/E1UHyUw-0001oj-HE@wrigleys.postgresql.org
>
> Someone who knows XML needs to take leadership on this and propose a
> patch.

agreed

Stefan

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alvaro Herrera 2013-10-04 22:31:41 Re: BUG #8470: 9.3 locking/subtransaction performance regression
Previous Message Alvaro Herrera 2013-10-04 17:22:17 Re: BUG #8434: Why does dead lock occur many times ?