BUG #15342: pg_dump - XML with mixed content types generates invalid backup file

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: ryan(at)rustprooflabs(dot)com
Subject: BUG #15342: pg_dump - XML with mixed content types generates invalid backup file
Date: 2018-08-20 17:59:11
Message-ID: 153478795159.1302.9617586466368699403@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 15342
Logged by: Ryan Lambert
Email address: ryan(at)rustprooflabs(dot)com
PostgreSQL version: 9.6.7
Operating system: Ubuntu 16; Ubuntu 18; Raspbian (Pi)
Description:

Greetings!

It seems that `pg_dump` is unable to provide a reliable database backups
that include specific combinations of XML data. The following SQL Fiddle
creates a table with three rows of XML data. The first row, "Document, no
DOCTYPE" is the only row of the three that will always load from a backup
from `pg_dump`. I've tried this one a few sub-versions of 9.6 and 9.5.

http://sqlfiddle.com/#!17/78a83/1/0

The second row added includes a DOCTYPE declaration in the XML. Restoring
this row from pg_dump will fail unless you add `SET XML OPTION DOCUMENT;`.
Trying to restore the pg_dump file without adding `SET XML OPTION DOCUMENT`
returns:

```
ERROR: invalid XML content
DETAIL: line 2: StartTag: invalid element name
<!DOCTYPE document SYSTEM "subjects.dtd">
^
CONTEXT: COPY xml_doc, line 2, column data: "<?xml version="1.0"
standalone="no"?>
<!DOCTYPE document SYSTEM "subjects.dtd">
<document>
<..."
```

The third row restores with the default setting but fails if `SET XML OPTION
DOCUMENT;` is set.

```
ERROR: invalid XML document
DETAIL: line 1: Start tag expected, '<' not found
abc<foo>bar</foo><bar>foo</bar>
^
CONTEXT: COPY xml_doc, line 3, column data:
"abc<foo>bar</foo><bar>foo</bar>"
```

So it seems that if you have XML data that includes <!DOCTYPE> and other XML
that is just fragments... pg_dump won't work without manual tinkering and
headaches.

The specific data I use that is hanging me up is the QGIS layer style data
(stored in `public.layer_styles`).

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2018-08-20 19:38:52 BUG #15343: Segmentation fault using pg_dump with --exclude-table if table contains identity column
Previous Message Igor Neyman 2018-08-20 17:49:01 RE: FATAL ERROR: The application server could not be contacted.