Re: XML with invalid chars

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: XML with invalid chars
Date: 2011-05-08 22:25:27
Message-ID: 4DC71857.5070902@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04/27/2011 11:41 PM, Noah Misch wrote:
> On Wed, Apr 27, 2011 at 11:22:37PM -0400, Andrew Dunstan wrote:
>> On 04/27/2011 05:30 PM, Noah Misch wrote:
>>> To make things worse, the dump/reload problems seems to depend on your version
>>> of libxml2, or something. With git master, a CentOS 5 system with
>>> 2.6.26-2.1.2.8.el5_5.1 accepts the ^A byte, but an Ubuntu 8.04 LTS system with
>>> 2.6.31.dfsg-2ubuntu rejects it. Even with a patch like this, systems with a
>>> lenient libxml2 will be liable to store XML data that won't restore on a system
>>> with a strict libxml2. Perhaps we should emit a build-time warning if the local
>>> libxml2 is lenient?
>> No, I think we need to be strict ourselves.
> Then I suppose we'd also scan for invalid characters in xml_parse()? Or, at
> least, do so when linked to a libxml2 that neglects to do so itself?

Yep.

>>> Injecting the check here aids "xmlelement" and "xmlforest" , but "xmlcomment"
>>> and "xmlpi" still let the invalid byte through. You can also still inject the
>>> byte into an attribute value via "xmlelement". I wonder if it wouldn't make
>>> more sense to just pass any XML that we generate from scratch through libxml2.
>>> There are a lot of holes to plug, otherwise.
>> Maybe there are, but I'd want lots of convincing that we should do that
>> at this stage. Maybe for 9.2. I think we can plug the holes fairly
>> simply for xmlpi and xmlcomment, and catch the attributes by moving this
>> check up into map_sql_value_to_xml_value().
> I don't have much convincing to offer -- hunting down the holes seem fine, too.
>
>

I think I've done that. Here's the patch I have now. It looks like we
can catch pretty much everything by putting checks in four places, which
isn't too bad.

Please review and try to break.

cheers

andrew

Attachment Content-Type Size
xmlchars2.patch text/x-patch 2.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message lee Richard 2011-05-08 23:30:26 Re: Questions about the internal of fastpath function call
Previous Message Heikki Linnakangas 2011-05-08 22:11:24 Re: patch for new feature: Buffer Cache Hibernation