Re: [PATCH] Add pretty-printed XML output option

From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-03-13 12:08:11
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10.03.23 15:32, Tom Lane wrote:
> Jim Jones<jim(dot)jones(at)uni-muenster(dot)de> writes:
>> On 09.03.23 21:21, Tom Lane wrote:
>>> I've looked through this now, and have some minor complaints and a major
>>> one. The major one is that it doesn't work for XML that doesn't satisfy
>>> IS DOCUMENT. For example,
>> How do you suggest the output should look like?
> I'd say a series of node trees, each starting on a separate line.

v22 attached enables the usage of INDENT with non singly-rooted xml.

postgres=# SELECT xmlserialize (CONTENT '<bar><val
x="y">42</val></bar><foo>73</foo>' AS text INDENT);
 <bar>                +
   <val x="y">42</val>+
 </bar>               +
(1 row)

I tried several libxml2 dump functions and none of them could cope very
well with an xml string without a root node. So added them into a
temporary root node, so that I could iterate over its children and add
them one by one (formatted) into the output buffer.

I slightly modified the existing xml_parse() function to return the list
of nodes parsed by xmlParseBalancedChunkMemory:

xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace,
          int encoding, Node *escontext, *xmlNodePtr *parsed_nodes*)

res_code = xmlParseBalancedChunkMemory(doc, NULL, NULL, 0,
utf8string + count, *parsed_nodes*);

>> I was mistakenly calling xml_parse with GetDatabaseEncoding(). It now
>> uses the encoding of the given doc and UTF8 if not provided.
> Mmmm .... doing this differently from what we do elsewhere does not
> sound like the right path forward. The input *is* (or had better be)
> in the database encoding.
I changed that behavior. It now uses GetDatabaseEncoding();


Best, Jim

Attachment Content-Type Size
v22-0001-Add-pretty-printed-XML-output-option.patch text/x-patch 35.3 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2023-03-13 12:20:59 Re: Lock mode in ExecMergeMatched()
Previous Message 'Sandro Santilli' 2023-03-13 11:59:16 Re: Ability to reference other extensions by schema in extension scripts