From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Robert Treat <rob(at)xzilla(dot)net> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Erik Wienhold <ewie(at)ewie(dot)name> |
Subject: | Re: Regression with large XML data input |
Date: | 2025-07-25 18:21:26 |
Message-ID: | 1944118.1753467686@sss.pgh.pa.us |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Robert Treat <rob(at)xzilla(dot)net> writes:
> On Thu, Jul 24, 2025 at 8:08 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>> If it were discussing things from the perspective where this new code
>> was added after a major version bump of Postgres, I would not argue
>> much about that: breakages happen every year and users adapt their
>> applications to it. Here, however, we are talking about a change in a
>> stable branch, across a minor version, which should be a bit more
>> flawless from a user perspective?
> While I am pretty sympathetic to the idea that we hang our hats on
> "Postgres doesn't break things in minor version updates", and this
> seems to betray that, one scenario where we would break things is if
> it were the only reasonable option wrt a bug / security fix, which
> this seems potentially close to.
I'll be the first to say that I'm not too pleased with it either.
However, from Jim Jones' result upthread, a "minor update" of libxml2
could also have caused this problem: 2.9.7 and 2.9.14 behave
differently. So we don't have sole control --- or sole responsibility
--- here.
I'd be more excited about trying to avoid the failure if I were not
afraid that "avoid the failure" really means "re-expose a security
hazard". Why should we believe that if libxml2 throws a
resource-limit error (for identical inputs) in one code path and not
another, that's anything but a missed error check in the second path?
(Maybe this is the same thing Robert is saying, not quite sure.)
> There are a lot of public data sets that provide xml dumps as a
> generic format for "non-commercial databases", and those can often be
> quite large. I suspect we don't see those use cases a lot because
> historically users have been forced to resort to perl/python/etc
> scripts to convert the data prior to ingesting. Which is to say, I
> think these use cases are more common than we think, and if there were
> ever a stable implementation that supported these large use cases,
> we'll start to see more of it.
Yeah, it's a real shame that we don't have more-reliable
infrastructure for XML. I'm not volunteering to fix it though...
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Andrey Borodin | 2025-07-25 18:33:39 | Re: IPC/MultixactCreation on the Standby server |
Previous Message | Patrick Stählin | 2025-07-25 18:06:39 | Re: Draft for basic NUMA observability |