Re: [PATCH] Add pretty-printed XML output option

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-03-14 17:40:25
Message-ID: 2752578.1678815625@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> writes:
> [ v22-0001-Add-pretty-printed-XML-output-option.patch ]

I poked at this for awhile and ran into a problem that I'm not sure
how to solve: it misbehaves for input with embedded DOCTYPE.

regression=# SELECT xmlserialize(DOCUMENT '<!DOCTYPE a><a/>' as text indent);
xmlserialize
--------------
<!DOCTYPE a>+
<a></a> +

(1 row)

regression=# SELECT xmlserialize(CONTENT '<!DOCTYPE a><a/>' as text indent);
xmlserialize
--------------

(1 row)

The bad result for CONTENT is because xml_parse() decides to
parse_as_document, but xmlserialize_indent has no idea that happened
and tries to use the content_nodes list anyway. I don't especially
care for the laissez faire "maybe we'll set *content_nodes and maybe
we won't" API you adopted for xml_parse, which seems to be contributing
to the mess. We could pass back more info so that xmlserialize_indent
knows what really happened. However, that won't fix the bogus output
for the DOCUMENT case. Are we perhaps passing incorrect flags to
xmlSaveToBuffer?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-03-14 17:45:21 DROP DATABASE is interruptible
Previous Message Jeff Davis 2023-03-14 17:10:42 Re: ICU locale validation / canonicalization