Re: Regression with large XML data input

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Treat <rob(at)xzilla(dot)net>, Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Erik Wienhold <ewie(at)ewie(dot)name>
Subject: Re: Regression with large XML data input
Date: 2025-07-28 02:09:33
Message-ID: aIbb3TOq3XP923e5@paquier.xyz
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 25, 2025 at 02:21:26PM -0400, Tom Lane wrote:
> I'll be the first to say that I'm not too pleased with it either.
> However, from Jim Jones' result upthread, a "minor update" of libxml2
> could also have caused this problem: 2.9.7 and 2.9.14 behave
> differently. So we don't have sole control --- or sole responsibility
> --- here.

This sentence is incorrect after I have double-checked the behaviors I
am seeing based on local builds of libxml2 2.9.7 and 2.9.14. For
example with the top of REL_15_STABLE, with and without
72c65d6658d4, I am getting (removing the exception in the example does
not matter if it's a success):
- libxml2 2.9.7 + top of REL_15_STABLE => test failure
- libxml2 2.9.14 + top of REL_15_STABLE => test failure
- libxml2 2.9.7 + top of REL_15_STABLE + revert of 72c65d6658d4
=> test success
- libxml2 2.9.14 + top of REL_15_STABLE + revert of 72c65d6658d4
=> test success

So if one uses a version of libxml2 2.9.X, he/she would be able to see
the large data case work with Postgres at 72c65d6658d4^1, and a
failure with 72c65d6658d4 and onwards. Taking Postgres in isolation
with any version of libxml2 in the 2.9.X series prevents the case to
work. This does not depend on 2.9.X, only on the fact that we link
Postgres to a newer major version of libxml2.

Please note that this is also the behavior I see in a Debian GID
environment and I guess any existing Debian release: we rely on
libxml2 2.9.X, so a minor upgrade of Postgres is the factor able to
trigger the behavior change. It seems to me that there's an argument
for compatibility with the 2.9.X series, which still seems quite
present in the wild, and that we could decide one solution over the
other in xml_parse() based on LIBXML_VERSION. What I am seeing is
that at fixed major version of libxml2, then Postgres holds the
responsibility here.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-07-28 02:16:47 Re: Regression with large XML data input
Previous Message torikoshia 2025-07-28 01:22:50 Re: speedup COPY TO for partitioned table.