Re: Issue: Deprecation of the XML2 module 'xml_is_well_formed' function

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Mike Fowler <mike(at)mlfowler(dot)com>
Cc: Mike Rylander <mrylander(at)gmail(dot)com>, Mike Berrow <mberrow(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Issue: Deprecation of the XML2 module 'xml_is_well_formed' function
Date: 2010-07-01 22:41:31
Message-ID: AANLkTimwfCMIxIUXKV4sYnaQAbWIoGg0hH41xhuquwI-@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 1, 2010 at 12:25 PM, Mike Fowler <mike(at)mlfowler(dot)com> wrote:
> Quoting Mike Fowler <mike(at)mlfowler(dot)com>:
>
>> Should the IS DOCUMENT predicate support this? At the moment you get
>> the following:
>>
>> template1=# SELECT
>>
>> '<towns><town>Bidford-on-Avon</town><town>Cwmbran</town><town>Bristol</town></towns>'
>>  IS
>> DOCUMENT;
>> ?column?
>> ----------
>> t
>> (1 row)
>>
>> template1=# SELECT
>>
>> '<towns><town>Bidford-on-Avon</town><town>Cwmbran</town><town>Bristol</town></towns'
>>  IS
>> DOCUMENT;
>> ERROR:  invalid XML content
>> LINE 1: SELECT '<towns><town>Bidford-on-Avon</town><town>Cwmbran</to...
>>              ^
>> DETAIL:  Entity: line 1: parser error : expected '>'
>>
>> owns><town>Bidford-on-Avon</town><town>Cwmbran</town><town>Bristol</town></towns
>>
>>      ^
>> Entity: line 1: parser error : chunk is not well balanced
>>
>> owns><town>Bidford-on-Avon</town><town>Cwmbran</town><town>Bristol</town></towns
>>
>>      ^
>> I would've hoped the second would've returned 'f' rather than failing.
>> I've had a glance at the XML/SQL standard and I don't see anything in
>> the detail of the predicate (8.2) that would specifically prohibit us
>> from changing this behavior, unless the common rule  'Parsing a string
>> as an XML value' (10.16) must always be in force. I'm no standard
>> expert, but IMHO this would be an acceptable change to improve
>> usability. What do others think?
>
> Right, I've answered my own question whilst sitting in the open source
> coding session at CHAR(10). Yes, IS DOCUMENT should return false for a
> non-well formed document, and indeed is coded to do such. However, the
> conversion to the xml type which happens before the underlying
> xml_is_document function is even called fails and exceptions out. I'll work
> on a patch to resolve this behavior such that IS DOCUMENT will give you the
> missing 'xml_is_well_formed' function.

I think the point if "IS DOCUMENT" is to distinguish a document:

<foo>some stuff<bar/><baz/></foo>

from a document fragment:

<bar/><baz/>

A document is allowed only one toplevel tag.

It'd be nice, I think, to have a function that tells you whether
something is legal XML without throwing an error if it isn't, but I
suspect that should be a separate function, rather than trying to jam
it into "IS DOCUMENT".

http://developer.postgresql.org/pgdocs/postgres/functions-xml.html#AEN15187

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message uwcssa 2010-07-02 01:20:25 hello
Previous Message Guillaume Lelarge 2010-07-01 22:31:30 Re: Cannot cancel the change of a tablespace