Skip site navigation (1) Skip section navigation (2)

Re: XML with invalid chars

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: XML with invalid chars
Date: 2011-04-28 03:22:37
Message-ID: 4DB8DD7D.3070905@dunslane.net (view raw or flat)
Thread:
Lists: pgsql-hackers

On 04/27/2011 05:30 PM, Noah Misch wrote:
>
>> I'm not sure what to do about the back branches and cases where data is
>> already in databases. This is fairly ugly. Suggestions welcome.
> We could provide a script in (or linked from) the release notes for testing the
> data in all your xml columns.

Yeah, we'll have to do something like that. What a blasted mess,

> To make things worse, the dump/reload problems seems to depend on your version
> of libxml2, or something.  With git master, a CentOS 5 system with
> 2.6.26-2.1.2.8.el5_5.1 accepts the ^A byte, but an Ubuntu 8.04 LTS system with
> 2.6.31.dfsg-2ubuntu rejects it.  Even with a patch like this, systems with a
> lenient libxml2 will be liable to store XML data that won't restore on a system
> with a strict libxml2.  Perhaps we should emit a build-time warning if the local
> libxml2 is lenient?

No, I think we need to be strict ourselves.

>> + 				if (*p<  '\x20')
> This needs to be an unsigned comparison.  On my system, "char" is signed, so
> "SELECT xmlelement(name foo, null, E'\u0550')" fails incorrectly.

Good point. Perhaps we'd be better off using iscntrl(*p).


> The XML character set forbids more than just control characters; see
> http://www.w3.org/TR/xml/#charsets.  We also ought to reject, for example,
> "SELECT xmlelement(name foo, null, E'\ufffe')".
>
> Injecting the check here aids "xmlelement" and "xmlforest" , but "xmlcomment"
> and "xmlpi" still let the invalid byte through.  You can also still inject the
> byte into an attribute value via "xmlelement".  I wonder if it wouldn't make
> more sense to just pass any XML that we generate from scratch through libxml2.
> There are a lot of holes to plug, otherwise.
>


Maybe there are, but I'd want lots of convincing that we should do that 
at this stage. Maybe for 9.2. I think we can plug the holes fairly 
simply for xmlpi and xmlcomment, and catch the attributes by moving this 
check up into map_sql_value_to_xml_value().

This is a significant data integrity bug, much along the same lines as 
the invalidly encoded data holes we plugged a release or two back. I'm 
amazed we haven't hit it till now, but we're sure to see more of it - 
XML use with Postgres is growing substantially, I believe.

cheers

andrew

In response to

Responses

pgsql-hackers by date

Next:From: HSIEN-WEN CHUDate: 2011-04-28 03:33:44
Subject: VX_CONCURRENT flag on vxfs( 5.1 or later) for performance for postgresql?
Previous:From: Vlad ArkhipovDate: 2011-04-28 03:07:34
Subject: Re: Predicate locking

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group