From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> |
Cc: | Erik Wienhold <ewie(at)ewie(dot)name>, Michael Paquier <michael(at)paquier(dot)xyz>, Robert Treat <rob(at)xzilla(dot)net>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Regression with large XML data input |
Date: | 2025-07-29 15:27:37 |
Message-ID: | 997668.1753802857@sss.pgh.pa.us |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> writes:
> On 29.07.25 14:11, Tom Lane wrote:
>> In the original coding, there was a hazard of the node list getting
>> leaked if the caller passed parsed_nodes == NULL. Or at least I
>> thought there was. It may be that all releases of libxml2 are smart
>> enough to free the node list if there's no way to pass it back,
>> but I guess we had reason not to trust it. Possibly there's something
>> about that in the discussion that led up to 6082b3d5d, though I see
>> I neglected to mention it in the commit message.
> I see.. thanks for explaining.
Re-reading the prior thread, I see that my memory above is quite
faulty: we added the node_list intermediate variable as a way to
detect errors when the return code from xmlParseBalancedChunkMemory
couldn't be trusted. So I think you're right to question whether we
still need it. I tried reverting to just passing parsed_nodes, and
I don't see any leak in either the normal or error paths --- so at
least with the quite-old version of libxml2 I'm testing, there is no
such bug.
> I went through the discussions and the libxml2 issue, and I also think
> it is prudent to keep it like that :)
I've got mixed feelings about it now. I think the $64 question
is whether there are any cases in which xmlParseBalancedChunkMemory
thinks things are fine (and returns a node list) but then we conclude
there's an error, perhaps as a consequence of xmlerrcxt->err_occurred
having become set earlier. That's a little bit of a stretch.
In any case, I now realize that I broke that scenario yesterday
because I forgot that xml_errsave could throw a longjmp --- so freeing
the node list after calling it is too late :-(
On the whole I'm inclined to revert to the previous coding without
a node_list variable.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2025-07-29 15:32:02 | Re: split func.sgml to separated individual sgml files |
Previous Message | Tom Lane | 2025-07-29 14:37:41 | Re: Making pgfdw_report_error statically analyzable |