Re: Question about xmloption and pg_restore

From: Chapman Flack <chap(at)anastigmatix(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Stefan Fercot <stefan(dot)fercot(at)dalibo(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Question about xmloption and pg_restore
Date: 2018-10-25 13:25:31
Message-ID: 5BD1C44B.6040300@anastigmatix.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/25/18 05:02, Tom Lane wrote:
> Chapman Flack <chap(at)anastigmatix(dot)net> writes:
>> a difference between the 2003 SQL/XML standard (which PG implements) and
>> the later versions, which changed the data model so there really is a
>> containment relationship between 'content' and 'document'.
>> https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#XML_OPTION
>
> See also
> https://www.postgresql.org/message-id/flat/153478795159.1302.9617586466368699403%40wrigleys.postgresql.org
>
> It's odd that people are just reporting this now when it's been like that
> for quite a few years, but anyway we've got a problem. Sounds like maybe
> adopting the later standards' definitions would fix it? Although I have
> no idea how complicated that'd be.

Supporting the later standards entirely would be a commendable thing,
but honest work:

https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#Possible_ways_forward

OTOH, making the current XML parsing not fail in this particular case
(which could be viewed as adopting the later standards' relationship
of CONTENT to DOCUMENT) might just be as simple as having the current
parsing code for CONTENT detect whether the string "starts with" a
<!DOCTYPE and fall back to the existing parsing code for DOCUMENT
if it does.

... where "starts with" actually means "possibly following some
whitespace, comments, or PIs, but you can stop looking if you see
a start-element", so essentially a port to C of:

https://github.com/tada/pljava/blob/V1_5_1/pljava/src/main/java/org/postgresql/pljava/jdbc/SQLXMLImpl.java#L409

which decides whether the input should be passed straight to the DOCUMENT-
style parser or somehow treated specially to parse as CONTENT. In Java
the special treatment involves a wrapping element, in xml.c it involves
calling a different libxml2 function, xmlParseBalancedChunkMemory, but
the choice of which method to apply is the same choice.

IIRC, XML comments don't nest, so it may be that "possibly following
some whitespace, comments, or PIs" could be shown to be a regular language,
and checked with a regex. I did it the more explicit way in Java for
clarity, and because the API was there, and so I wouldn't have to think
about it.

-Chap

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hironobu SUZUKI 2018-10-25 13:56:04 Re: Support custom socket directory in pg_upgrade
Previous Message Marius Timmer 2018-10-25 13:08:11 [PATCH] pg_hba.conf : new auth option : clientcert=verify-full