Re: Regression with large XML data input

From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Erik Wienhold <ewie(at)ewie(dot)name>, Michael Paquier <michael(at)paquier(dot)xyz>, Robert Treat <rob(at)xzilla(dot)net>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Regression with large XML data input
Date: 2025-07-29 18:02:30
Message-ID: a8771e75-60ee-4c99-ae10-ca4832e1ec8d@uni-muenster.de
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 29.07.25 17:27, Tom Lane wrote:
> Re-reading the prior thread, I see that my memory above is quite
> faulty: we added the node_list intermediate variable as a way to
> detect errors when the return code from xmlParseBalancedChunkMemory
> couldn't be trusted. So I think you're right to question whether we
> still need it. I tried reverting to just passing parsed_nodes, and
> I don't see any leak in either the normal or error paths --- so at
> least with the quite-old version of libxml2 I'm testing, there is no
> such bug.
>
>> I went through the discussions and the libxml2 issue, and I also think
>> it is prudent to keep it like that :)
> I've got mixed feelings about it now. I think the $64 question
> is whether there are any cases in which xmlParseBalancedChunkMemory
> thinks things are fine (and returns a node list) but then we conclude
> there's an error, perhaps as a consequence of xmlerrcxt->err_occurred
> having become set earlier. That's a little bit of a stretch.
>
> In any case, I now realize that I broke that scenario yesterday
> because I forgot that xml_errsave could throw a longjmp --- so freeing
> the node list after calling it is too late :-(
>
> On the whole I'm inclined to revert to the previous coding without
> a node_list variable.

I also didn't spot any leaks, but I was rather hesitant to remove it
after re-reading the code, since there's still a risk of leakage if the
caller fails to free parsed_nodes in case of an error. However, it seems
that only xmltotext_with_options relies on this, and even then, the
result of parsed_nodes is added to a document that gets freed in case of
failure, so it looks like we're covered.

Best regards, Jim

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2025-07-29 18:13:14 Re: pg_dump --with-* options
Previous Message Jeff Davis 2025-07-29 17:41:05 Re: pg_dump --with-* options