Re: Regression with large XML data input

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Erik Wienhold <ewie(at)ewie(dot)name>
Subject: Re: Regression with large XML data input
Date: 2025-07-24 18:10:29
Message-ID: 1685956.1753380629@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Michael Paquier <michael(at)paquier(dot)xyz> writes:
>> A customer has reported a regression with the parsing of rather large
>> XML data, introduced by the set of backpatches done with f68d6aabb7e2
>> & friends.

> Bleah.

The supplied test case hides important details in the error message.
If you get rid of the exception block so that the error is reported
in full, what you see is

regression=# CREATE TEMP TABLE xmldata (id BIGINT PRIMARY KEY, message XML );
CREATE TABLE
regression=# DO $$ DECLARE size_40mb TEXT := repeat('X', 40000000);
regression$# BEGIN
regression$# INSERT INTO xmldata (id, message) VALUES
regression$# ( 1, (('<Root><Item><Name>Test40MB</Name><Content>' || size_40mb || '</Content></Item></Root>')::xml) );
regression$# END $$;
ERROR: invalid XML content
DETAIL: line 1: internal error: Huge input lookup
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
^
CONTEXT: SQL statement "INSERT INTO xmldata (id, message) VALUES
( 1, (('<Root><Item><Name>Test40MB</Name><Content>' || size_40mb || '</Content></Item></Root>')::xml) )"
PL/pgSQL function inline_code_block line 3 at SQL statement
regression=#

That is, what we are hitting is libxml2's internal protections
against processing "too large" input. I am not really sure
why the other coding failed to hit this same thing, but I wonder
if we shouldn't leave well enough alone. See commits 2197d0622
and f2743a7d7, where we tried to enable such cases and then
decided it was too risky. I'm afraid now that our prior coding
might have allowed billion-laugh-like cases to be reachable.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2025-07-24 18:10:40 Re: Remaining dependency on setlocale()
Previous Message Robert Haas 2025-07-24 18:02:39 Re: Non-text mode for pg_dumpall