Re: Fix XML handling with DOCTYPE

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Chapman Flack <chap(at)anastigmatix(dot)net>
Cc: Ryan Lambert <ryan(at)rustprooflabs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fix XML handling with DOCTYPE
Date: 2019-03-17 19:06:07
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Chapman Flack <chap(at)anastigmatix(dot)net> writes:
> On 03/17/19 13:16, Tom Lane wrote:
>> Do we need a pre-scan at all?

> Without it, we double the time to a failure result in every case that
> should actually fail, as well as in this one corner case that we want to
> see succeed, and the question you posed earlier about which error message
> to return becomes thornier.

I have absolutely zero concern about whether it takes twice as long to
detect bad input; nobody should be sending bad input if they're concerned
about performance. (The costs of the ensuing transaction abort would
likely dwarf xml_in's runtime in any case.) Besides, with what we're
talking about doing here,

(1) the extra runtime is consumed only in cases that would fail up to now,
so nobody can complain about a performance regression;
(2) doing a pre-scan *would* be a performance regression for cases that
work today; not a large one we hope, but still...

The error message issue is indeed a concern, but I don't see why it's
complicated if you agree that

> If the query asked for CONTENT, any error result should be one you could get
> when parsing as CONTENT.

That just requires us to save the first error message and be sure to issue
that one not the DOCUMENT one, no? That's what we'd want to do from a
backwards-compatibility standpoint anyhow, since that's the error message
wording you'd get with today's code.

regards, tom lane

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-03-17 19:41:43 Re: Unduly short fuse in RequestCheckpoint
Previous Message Jonathan S. Katz 2019-03-17 18:29:44 Re: jsonpath