Quick Links

Re: Fix XML handling with DOCTYPE

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Chapman Flack <chap(at)anastigmatix(dot)net>
Cc:	Ryan Lambert <ryan(at)rustprooflabs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Fix XML handling with DOCTYPE
Date:	2019-03-16 21:21:12
Message-ID:	24203.1552771272@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Chapman Flack <chap(at)anastigmatix(dot)net> writes:
> On 03/16/19 16:55, Tom Lane wrote:
>> What do you think of the idea I just posted about parsing off the DOCTYPE
>> thing for ourselves, and not letting libxml see it?

> The principled way of doing that would be to pre-parse to find a DOCTYPE,
> and if there is one, leave it there and parse the input as we do for
> 'document'. Per XML, if there is a DOCTYPE, the document must satisfy
> the 'document' syntax requirements, and per SQL/XML:2006-and-later,
> 'content' is a proper superset of 'document', so if we were asked for
> 'content' and can successfully parse it as 'document', we're good,
> and if we see a DOCTYPE and yet it incurs a parse error as 'document',
> well, that's what needed to happen.

Hm, so, maybe just

(1) always try to parse as document. If successful, we're done.

(2) otherwise, if allowed by xmloption, try to parse using our
current logic for the CONTENT case.

This avoids adding any new assumptions about how libxml acts,
which is what I was hoping to achieve.

One interesting question is which error to report if both (1) and (2)
fail.

regards, tom lane

In response to

Re: Fix XML handling with DOCTYPE at 2019-03-16 21:11:29 from Chapman Flack

Responses

Re: Fix XML handling with DOCTYPE at 2019-03-16 22:33:19 from Chapman Flack

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Dean Rasheed	2019-03-16 21:26:40	Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Previous Message	Chapman Flack	2019-03-16 21:11:29	Re: Fix XML handling with DOCTYPE