Re: XML with invalid chars

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: XML with invalid chars
Date: 2011-04-27 19:05:30
Message-ID: 4DB868FA.7020708@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04/26/2011 05:11 PM, Noah Misch wrote:
> On Mon, Apr 25, 2011 at 07:25:02PM -0400, Andrew Dunstan wrote:
>> I came across this today, while helping a customer. The following will
>> happily create a piece of XML with an embedded ^A:
>>
>> select xmlelement(name foo, null, E'abc\x01def');
>>
>> Now, a ^A is totally forbidden in XML version 1.0, and allowed but only
>> as "&#x01;" or equivalent in XML version 1.1, and not as a 0x01 byte
>> (see<http://en.wikipedia.org/wiki/XML#Valid_characters>)
>>
>> ISTM this is something we should definitely try to fix ASAP, even if we
>> probably can't backpatch the fix.
> +1. Given that such a datum breaks dump+reload, it seems risky to do nothing at
> all in the back branches.

Here's a patch along the lines suggested by Peter.

I'm not sure what to do about the back branches and cases where data is
already in databases. This is fairly ugly. Suggestions welcome.

cheers

andrew

Attachment Content-Type Size
xmlchars.patch text/x-patch 825 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2011-04-27 19:08:29 Re: pgsql: Fix pg_size_pretty() to avoid overflow for inputs close to INT64
Previous Message David Fetter 2011-04-27 19:04:54 Re: [HACKERS] PostgreSQL Core Team