Re: [PATCH] Add pretty-printed XML output option

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-03-15 21:13:36
Message-ID: 3622266.1678914816@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Huh, interesting. That is a legitimate pretty-fication of the input,
> I suppose, but some people might think it goes beyond the charter of
> "indentation". I'm okay with it personally; anyone want to object?

Hearing no objections to that, I moved ahead with this.

It occurred to me to test v23 for memory leaks, and it had bad ones:

* the "newline" node used in the CONTENT case never got freed.
Making another one for each line wasn't helping, either.

* libxml, at least in the 2.9.7 version I have here, turns out to
leak memory if you pass a non-null encoding to xmlSaveToBuffer.
But AFAICS we don't really need to do that, because the last thing
we want is for libxml to try to do any encoding conversion.

After cleaning that up, I saw that we were indeed doing essentially
duplicative xml_parse calls for the DOCUMENT check and the indentation
work, so I refactored to allow just one call to serve.

Pushed with those changes and some other cosmetic cleanup.
Thanks for working so hard on this!

(Now to keep an eye on the buildfarm, to see if other versions of
libxml work like mine ...)

BTW, the libxml leak problem seems to extend to other cases too.
I tested with code like

do $$
declare x xml; t text;
begin
x := '<?xml version="1.0" encoding="utf8"?><foo><bar><val>73</val></bar></foo>';
for i in 1..10000000 loop
t := xmlserialize(document x as text);
end loop;
raise notice 't = %', t;
end;
$$;

That case is fine, but if you change the encoding spec to "latin1",
it leaks like mad. That problem is not the fault of this patch,
I don't think. I wonder if we need to do something to prevent
libxml from seeing encoding declarations other than utf8?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2023-03-15 21:13:56 Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED
Previous Message Andrew Dunstan 2023-03-15 20:43:33 Re: Add support for DEFAULT specification in COPY FROM