Re: XML with invalid chars

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: XML with invalid chars
Date: 2011-04-28 03:22:37
Message-ID: 4DB8DD7D.3070905@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04/27/2011 05:30 PM, Noah Misch wrote:
>
>> I'm not sure what to do about the back branches and cases where data is
>> already in databases. This is fairly ugly. Suggestions welcome.
> We could provide a script in (or linked from) the release notes for testing the
> data in all your xml columns.

Yeah, we'll have to do something like that. What a blasted mess,

> To make things worse, the dump/reload problems seems to depend on your version
> of libxml2, or something. With git master, a CentOS 5 system with
> 2.6.26-2.1.2.8.el5_5.1 accepts the ^A byte, but an Ubuntu 8.04 LTS system with
> 2.6.31.dfsg-2ubuntu rejects it. Even with a patch like this, systems with a
> lenient libxml2 will be liable to store XML data that won't restore on a system
> with a strict libxml2. Perhaps we should emit a build-time warning if the local
> libxml2 is lenient?

No, I think we need to be strict ourselves.

>> + if (*p< '\x20')
> This needs to be an unsigned comparison. On my system, "char" is signed, so
> "SELECT xmlelement(name foo, null, E'\u0550')" fails incorrectly.

Good point. Perhaps we'd be better off using iscntrl(*p).

> The XML character set forbids more than just control characters; see
> http://www.w3.org/TR/xml/#charsets. We also ought to reject, for example,
> "SELECT xmlelement(name foo, null, E'\ufffe')".
>
> Injecting the check here aids "xmlelement" and "xmlforest" , but "xmlcomment"
> and "xmlpi" still let the invalid byte through. You can also still inject the
> byte into an attribute value via "xmlelement". I wonder if it wouldn't make
> more sense to just pass any XML that we generate from scratch through libxml2.
> There are a lot of holes to plug, otherwise.
>

Maybe there are, but I'd want lots of convincing that we should do that
at this stage. Maybe for 9.2. I think we can plug the holes fairly
simply for xmlpi and xmlcomment, and catch the attributes by moving this
check up into map_sql_value_to_xml_value().

This is a significant data integrity bug, much along the same lines as
the invalidly encoded data holes we plugged a release or two back. I'm
amazed we haven't hit it till now, but we're sure to see more of it -
XML use with Postgres is growing substantially, I believe.

cheers

andrew

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message HSIEN-WEN CHU 2011-04-28 03:33:44 VX_CONCURRENT flag on vxfs( 5.1 or later) for performance for postgresql?
Previous Message Vlad Arkhipov 2011-04-28 03:07:34 Re: Predicate locking