From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Erik Wienhold <ewie(at)ewie(dot)name> |
Subject: | Re: Regression with large XML data input |
Date: | 2025-07-24 04:32:34 |
Message-ID: | aIG3Yn4dZ31G9WPO@paquier.xyz |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jul 23, 2025 at 11:28:38PM -0400, Tom Lane wrote:
> Michael Paquier <michael(at)paquier(dot)xyz> writes:
>> Switching back to the previous code, where we rely on
>> xmlParseBalancedChunkMemory() fixes the issue.
>
> Yeah, just reverting these commits might be an acceptable answer,
> since the main point was to work around a bleeding-edge bug:
Still it is not possible to do exactly that on all the branches
because of the business with XMLSERIALIZE that requires some options
for xmlParseInNodeContext(), is it?
>>> * Early 2.13.x releases of libxml2 contain a bug that causes
>>> xmlParseBalancedChunkMemory to return the wrong status value in some
>>> cases. This breaks our regression tests. While that bug is now fixed
>>> upstream and will probably never be seen in any production-oriented
>>> distro, it is currently a problem on some more-bleeding-edge-friendly
>>> platforms.
>
> Presumably that problem is now gone, a year later. The other point
> about
I would probably agree that it does not seem worth caring for this
range in the early 2.13 series. I didn't mention it upthread but all
my tests were with debian GID's libxml2 which seems to be a 2.12.7
flavor with some 2.9.14 pieces, based on what apt is telling me. I
did not test with a different version from upstream, but I'm pretty
sure that's the same story.
>>> * xmlParseBalancedChunkMemory is considered to depend on libxml2's
>>> semi-deprecated SAX1 APIs, and will go away when and if they do.
>
> is still hypothetical I think. But we might want to keep this bit:
Worth mentioning upstream 4f329dc52490, I guess, added to the 2.14
branch:
parser: Implement xmlCtxtParseContent
This implements xmlCtxtParseContent, a better alternative to
xmlParseInNodeContext or xmlParseBalancedChunkMemory. It accepts a
parser context and a parser input, making it a lot more versatile.
With all our stable branches, I am not sure if this should be
considered, but that seems worth keeping in mind.
>>> While here, avoid allocating an xmlParserCtxt in DOCUMENT parse mode,
>>> since that code path is not going to use it.
Are you planning to look at that for the next minor release? It would
take me a couple of hours to dig into all that, and I am sure that I
am going to need your stamp or Erik's to avoid doing a stupid thing.
--
Michael
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2025-07-24 04:35:58 | Re: Regression with large XML data input |
Previous Message | Yugo Nagata | 2025-07-24 03:44:40 | Re: Suggestion to add --continue-client-on-abort option to pgbench |