Re: Fix XML handling with DOCTYPE

From: Chapman Flack <chap(at)anastigmatix(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ryan Lambert <ryan(at)rustprooflabs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fix XML handling with DOCTYPE
Date: 2019-03-18 17:27:10
Message-ID: 5C8FD4EE.7090107@anastigmatix.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

There might be too many different email threads on this with patches,
but in case it went under the radar, xml-content-2006-3.patch appeared
in my previous message on this thread[1].

It is based on a simple pre-check of the prefix of the input, determining
which form of parse to apply. That may or may not be simpler than parse-
once-save-error-parse-again-report-first-error, but IMV it's a more direct
solution and clearer (the logic is clearly about "how do I determine the way
this input should be parsed?" which is the problem on the table, rather
than "how should I save and regurgitate this libxml error?" which turns the
problem on the table to a different one).

I decided, for a first point of reference, to wear the green eyeshade and
write a pre-check that exactly implements the applicable rules. That gives
a starting point for simplifications that are probably safe.

For example, a bunch of lines at the end have to do with verifying the
content inside of a processing-instruction, after finding where it ends.
We could reasonably decide that, for the purpose of skipping it, knowing
where it ends is enough, as libxml will parse it next and report any errors
anyway.

That would slightly violate my intention of sending input to (the parser
that wasn't asked for) /only/ when it's completely clear (from the prefix
we've seen) that that's where it should go. The relaxed version could do
that in completely-clear cases and cases with an invalid PI ahead of what
looks like a DTD. But you'd pretty much expect both parsers to produce
the same message for a bad PI anyway.

That made me just want to try it now, and--surprise!--the messages from
libxml are not the same. So maybe I would lean to keeping the green-eyeshade
rules in the test, if you can stomach them, but I would understand taking
them out.

Regards,
-Chap

[1] https://www.postgresql.org/message-id/5C8ECAA4.3090301@anastigmatix.net

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mithun Cy 2019-03-18 17:42:24 Re: BUG #15641: Autoprewarm worker fails to start on Windows with huge pages in use Old PostgreSQL community/pgsql-bugs x
Previous Message Yun Li 2019-03-18 17:23:43 Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?