From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Erik Wienhold <ewie(at)ewie(dot)name> |
Subject: | Re: Regression with large XML data input |
Date: | 2025-07-24 18:10:29 |
Message-ID: | 1685956.1753380629@sss.pgh.pa.us |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I wrote:
> Michael Paquier <michael(at)paquier(dot)xyz> writes:
>> A customer has reported a regression with the parsing of rather large
>> XML data, introduced by the set of backpatches done with f68d6aabb7e2
>> & friends.
> Bleah.
The supplied test case hides important details in the error message.
If you get rid of the exception block so that the error is reported
in full, what you see is
regression=# CREATE TEMP TABLE xmldata (id BIGINT PRIMARY KEY, message XML );
CREATE TABLE
regression=# DO $$ DECLARE size_40mb TEXT := repeat('X', 40000000);
regression$# BEGIN
regression$# INSERT INTO xmldata (id, message) VALUES
regression$# ( 1, (('<Root><Item><Name>Test40MB</Name><Content>' || size_40mb || '</Content></Item></Root>')::xml) );
regression$# END $$;
ERROR: invalid XML content
DETAIL: line 1: internal error: Huge input lookup
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
^
CONTEXT: SQL statement "INSERT INTO xmldata (id, message) VALUES
( 1, (('<Root><Item><Name>Test40MB</Name><Content>' || size_40mb || '</Content></Item></Root>')::xml) )"
PL/pgSQL function inline_code_block line 3 at SQL statement
regression=#
That is, what we are hitting is libxml2's internal protections
against processing "too large" input. I am not really sure
why the other coding failed to hit this same thing, but I wonder
if we shouldn't leave well enough alone. See commits 2197d0622
and f2743a7d7, where we tried to enable such cases and then
decided it was too risky. I'm afraid now that our prior coding
might have allowed billion-laugh-like cases to be reachable.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2025-07-24 18:10:40 | Re: Remaining dependency on setlocale() |
Previous Message | Robert Haas | 2025-07-24 18:02:39 | Re: Non-text mode for pg_dumpall |