Re: [PATCH] Add pretty-printed XML output option

From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-03-14 22:57:22
Message-ID: abd25443-ef6d-7b8a-c593-a2a991d3e5ce@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 14.03.23 18:40, Tom Lane wrote:
> Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> writes:
>> [ v22-0001-Add-pretty-printed-XML-output-option.patch ]
> I poked at this for awhile and ran into a problem that I'm not sure
> how to solve: it misbehaves for input with embedded DOCTYPE.
>
> regression=# SELECT xmlserialize(DOCUMENT '<!DOCTYPE a><a/>' as text indent);
> xmlserialize
> --------------
> <!DOCTYPE a>+
> <a></a> +
>
> (1 row)

The issue was the flag XML_SAVE_NO_EMPTY. It was forcing empty elements
to be serialized with start-end tag pairs. Removing it did the trick ...

postgres=# SELECT xmlserialize(DOCUMENT '<!DOCTYPE a><a/>' AS text INDENT);
 xmlserialize
--------------
 <!DOCTYPE a>+
 <a/>        +

(1 row)

... but as a side effect empty start-end tags will be now serialized as
empty elements

postgres=# SELECT xmlserialize(CONTENT '<foo><bar></bar></foo>' AS text
INDENT);
 xmlserialize
--------------
 <foo>       +
   <bar/>    +
 </foo>
(1 row)

It seems to be the standard behavior of other xml indent tools
(including Oracle)

> regression=# SELECT xmlserialize(CONTENT '<!DOCTYPE a><a/>' as text indent);
> xmlserialize
> --------------
>
> (1 row)
>
> The bad result for CONTENT is because xml_parse() decides to
> parse_as_document, but xmlserialize_indent has no idea that happened
> and tries to use the content_nodes list anyway. I don't especially
> care for the laissez faire "maybe we'll set *content_nodes and maybe
> we won't" API you adopted for xml_parse, which seems to be contributing
> to the mess. We could pass back more info so that xmlserialize_indent
> knows what really happened.

I added a new (nullable) parameter to the xml_parse function that will
return the actual XmlOptionType used to parse the xml data. Now
xmlserialize_indent knows how the data was really parsed:

postgres=# SELECT xmlserialize(CONTENT '<!DOCTYPE a><a/>' AS text INDENT);
 xmlserialize
--------------
 <!DOCTYPE a>+
 <a/>        +

(1 row)

I added test cases for these queries.

v23 attached.

Thanks!

Best, Jim

Attachment Content-Type Size
v23-0001-Add-pretty-printed-XML-output-option.patch text/x-patch 39.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-03-14 23:25:21 Re: [PATCH] Add pretty-printed XML output option
Previous Message Peter Geoghegan 2023-03-14 22:56:50 Re: Add pg_walinspect function with block info columns