Re: Fix XML handling with DOCTYPE

From: Chapman Flack <chap(at)anastigmatix(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ryan Lambert <ryan(at)rustprooflabs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fix XML handling with DOCTYPE
Date: 2019-03-17 18:13:03
Message-ID: 5C8E8E2F.9050600@anastigmatix.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/17/19 13:16, Tom Lane wrote:
> Chapman Flack <chap(at)anastigmatix(dot)net> writes:
>> What I was doing in the patch is the reverse: parsing with the expectation
>> of CONTENT to see if a DTD gets tripped over. It isn't allowed for an
>> element to precede a DTD, so that approach can be expected to fail fast
>> if the other branch needs to be taken.
>
> Ah, right. I don't have any problem with trying the CONTENT approach
> before the DOCUMENT approach rather than vice-versa. What I was concerned
> about was adding a lot of assumptions about exactly how libxml would
> report the failure. IMO a maximally-safe patch would just rearrange
> things we're already doing without adding new things.
>
>> But a quick pre-scan for the same thing would have the same property,
>> without the libxml dependencies that bother you here. Watch this space.
>
> Do we need a pre-scan at all?

Without it, we double the time to a failure result in every case that
should actually fail, as well as in this one corner case that we want to
see succeed, and the question you posed earlier about which error message
to return becomes thornier.

If the query asked for CONTENT, any error result should be one you could get
when parsing as CONTENT. If we switch and try parsing as DOCUMENT _because
the input is claiming to have the form of a DOCUMENT_, then it's defensible
to return errors explaining why it's not a DOCUMENT ... but not in the
general case of just throwing DOCUMENT at it any time CONTENT parse fails.

Regards,
-Chap

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jonathan S. Katz 2019-03-17 18:29:44 Re: jsonpath
Previous Message Fabien COELHO 2019-03-17 17:17:50 Re: Offline enabling/disabling of data checksums