Re: Fix XML handling with DOCTYPE

From: Chapman Flack <chap(at)anastigmatix(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ryan Lambert <ryan(at)rustprooflabs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fix XML handling with DOCTYPE
Date: 2019-03-17 22:31:00
Message-ID: 5C8ECAA4.3090301@anastigmatix.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/17/19 15:06, Tom Lane wrote:
> The error message issue is indeed a concern, but I don't see why it's
> complicated if you agree that
>
>> If the query asked for CONTENT, any error result should be one you could get
>> when parsing as CONTENT.
>
> That just requires us to save the first error message and be sure to issue
> that one not the DOCUMENT one, no?

I confess I haven't looked hard yet at how to do that. The way errors come
out of libxml is more involved than "here's a message", so there's a choice
of (a) try to copy off that struct in a way that's sure to survive
re-executing the parser, and then use the copy, or (b) generate a message
right away from the structured information and save that, and I guess b
might not be too bad; a might not be too bad, or it might, and slide right
back into the kind of libxml-behavior-assumptions you're wanting to avoid.

Meanwhile, here is a patch on the lines I proposed earlier, with a
pre-check. Any performance hit that it could entail (which I'd really
expect to be de minimis, though I haven't benchmarked) ought to be
compensated by the strlen I changed to strnlen in parse_xml_decl (as
there's really no need to run off and count the whole rest of the input
just to know if 1, 2, 3, or 4 bytes are available to decode a UTF-8 char).

... and, yes, I know that could be an independent patch, and then the
performance effect here should be measured from there. But it was near
what I was doing anyway, so I included it here.

Attaching both still-outstanding patches (this one and docfix) so the
CF app doesn't lose one.

Regards,
-Chap

Attachment Content-Type Size
xml-functions-type-docfix-3.patch text/x-patch 30.2 KB
xml-content-2006-3.patch text/x-patch 19.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2019-03-17 22:34:35 Re: Rare SSL failures on eelpout
Previous Message Tom Lane 2019-03-17 21:08:12 Re: Performance issue in foreign-key-aware join estimation