Re: BUG #7844: xpath missing entity decoding - bug or feature

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dan Scott <denials(at)gmail(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: BUG #7844: xpath missing entity decoding - bug or feature
Date: 2013-11-04 18:44:28
Message-ID: 25508.1383590668@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Dan Scott <denials(at)gmail(dot)com> writes:
> On Sept 09, 2013 Bruce Momjian wrote:
>> On Fri, Feb 1, 2013 at 12:02:41PM +0000, info(at)fduerr(dot)de wrote:
>>> The following bug has been logged on the website:
>>>
>>> Bug reference: 7844
>>> Logged by: fduerr
>>> Email address: info(at)fduerr(dot)de
>>> PostgreSQL version: 9.2.2
>>> Operating system: Debian
>>> Description:
>>>
>>> Up until 9.1
>>>
>>> select (xpath('/z/text()', ('<z>' || 'AT&amp;T' || '</z>')::xml))[1];
>>>
>>> returned 'AT&T'
>>> 9.2 returns 'AT&amp;T'
>>>
>>> Is it a bug or a feature?
>>> Is there a function to decode xml-entities?

>> Does anyone have a comment on this?

> Yes, the Evergreen project just ran into this change of behaviour and consider
> it a bug.

> https://bugs.launchpad.net/evergreen/+bug/1243023 tells the tale, but in short
> the XPath spec states in "5.2 Element Nodes":

> "Entity references to both internal and external entities are expanded.
> Character references are resolved." (http://www.w3.org/TR/xpath/)

> So we believe that the extracted text node children of element nodes should be
> resolved when we retrieve them, as they were in 9.1 and before.

The change in behavior was entirely intentional, see
http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=aaf15e5c1cf8d2c27d2f9841343f00027762cb4e
which was extensively discussed beforehand:
http://www.postgresql.org/message-id/flat/201106291934(dot)23089(dot)rsmogura(at)softperience(dot)eu

Before we'd consider reverting this, you'd have to explain why it would be
okay for xpath() to not return valid XML. I don't see that the bit of
spec you mention has anything to do with that consideration --- it's
talking about some internal processing steps to be done by xpath(),
not the representation of the final result.

It does seem that there should be a way to convert the result to text with
character escaping undone. I'm not seeing anything built-in for that,
but maybe I'm missing it.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2013-11-04 19:22:33 Re: [BUGS] BUG #8573: int4range memory consumption
Previous Message Dan Scott 2013-11-04 17:32:04 Re: BUG #7844: xpath missing entity decoding - bug or feature