Re: PostgreSQL vs SQL/XML Standards

From: Chapman Flack <chap(at)anastigmatix(dot)net>
To:
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: PostgreSQL vs SQL/XML Standards
Date: 2019-02-11 15:51:25
Message-ID: 3e8eab9e-7289-6c23-5e2c-153cccea2257@anastigmatix.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

[Resending to list so commitfest app will see it; the list blocked
this message the first time on a mail reputation issue. Sorry for
the duplication. I've removed the individual cc:s from this message.]

On 02/05/19 23:16, Chapman Flack wrote:
> I wonder whether, given the move to next CF, it makes sense to change
> the title of the CF entry from "XMLTABLE" to, more generically, XML
> improvements, and get one or two more small changes in:

Interpreting the crickets as approval, I have changed the title of the
CF entry, and the status back to Needs Review, with these patches
attached:

xmltable-xpath-result-processing-bugfix-6.patch
xmltable-xmlexists-passing-mechanisms-3.patch
xml-functions-type-docfix-2.patch
xml-content-2006-1.patch

That last one is new, and everything is rebased (onto 068503c).

xmltable-xpath-result-processing-bugfix-6.patch includes a regress/expected
output for the no-libxml case that was left out of -5.

xml-functions-type-docfix-2.patch removes one more sentence I had meant
to remove[1] but forgotten to.

xml-content-2006-1.patch does this:

> - get XMLPARSE(CONTENT... (and cast-to-xml with XMLOPTION=content) to
> succeed even for content with DTDs, so that the content subtype really
> does fully include the document subtype, aligning it with the SQL:2006+
> standard. I think this would be a simple patch that I can deliver early
> this month, and Tom found reports where the current behavior already
> bites people in pg_restore. Its only effect would be to allow a currently-
> failing case to succeed (and stop biting people).

It works as suggested in [2], just by intercepting the error if a
parse-as-content trips over a DTD, and retrying as a parse-as-document.

While that has a certain hacky smell, it also has the advantage of
handling what's probably an uncommon edge case in a way that adds no
upfront cost. (Other, 'tidier' approaches could involve evaluating a
regex first to decide how to parse--I believe everything that's allowed
ahead of a DTD makes a regular language--but that would add cycles to
every parse.)

In xml.c one does find the following comment:

* TODO maybe libxml2's xmlreader is better? (do not construct DOM,
* yet do not use SAX - see xmlreader.c)

and yes, I think a complete rewrite of xml_parse along those lines would
probably be a substantial win (why construct an internal DOM just to confirm
that the input is parsable, then throw it away?). But that would be a more
involved rewrite that I'm not volunteering to do.

This patch is a quick way to get the desired behavior given the current
implementation.

-Chap

[1]
https://www.postgresql.org/message-id/5C4A94A5.8010402%40anastigmatix.net
[2]
https://www.postgresql.org/message-id/5C4BDBFF.6040905%40anastigmatix.net

Attachment Content-Type Size
xmltable-xpath-result-processing-bugfix-6.patch text/x-patch 14.4 KB
xmltable-xmlexists-passing-mechanisms-3.patch text/x-patch 5.8 KB
xml-functions-type-docfix-2.patch text/x-patch 30.1 KB
xml-content-2006-1.patch text/x-patch 16.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-02-11 15:51:40 Re: libpq compression
Previous Message Alvaro Herrera 2019-02-11 15:46:07 Re: libpq compression