Re: Regression with large XML data input

From: Erik Wienhold <ewie(at)ewie(dot)name>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Regression with large XML data input
Date: 2025-07-24 19:01:11
Message-ID: 586b74bb-8941-420f-82c4-ba93e5b42b20@ewie.name
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2025-07-24 05:12 +0200, Michael Paquier wrote:
> Switching back to the previous code, where we rely on
> xmlParseBalancedChunkMemory() fixes the issue. A quick POC is
> attached. It fails one case in check-world with SERIALIZE because I
> am not sure it is possible to pass down some options through
> xmlParseBalancedChunkMemory(), still the regression is gone, and I am
> wondering if there is not a better solution to be able to dodge the
> original problem and still accept this case.

The whitespace can be preserved by setting xmlKeepBlanksDefault before
parsing. See attached v2. That function is deprecated, though. But
libxml2 uses thread-local globals, so it should be safe. Other than
that, I see no other way to set XML_PARSE_NOBLANKS with
xmlParseBalancedChunkMemory.

[1] https://gitlab.gnome.org/GNOME/libxml2/-/blob/408bd0e18e6ddba5d18e51d52da0f7b3ca1b4421/parserInternals.c#L2833

--
Erik Wienhold

Attachment Content-Type Size
0001-Fix-xml2-regression-v2.patch text/plain 2.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Cramer 2025-07-24 19:03:45 Re: More protocol.h replacements this time into walsender.c
Previous Message Tom Lane 2025-07-24 18:57:29 Re: Regression with large XML data input