From: | Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Robert Treat <rob(at)xzilla(dot)net>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Erik Wienhold <ewie(at)ewie(dot)name> |
Subject: | Re: Regression with large XML data input |
Date: | 2025-07-28 07:45:14 |
Message-ID: | cc0bd778-9730-4ef9-98b3-a965f8895331@uni-muenster.de |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 28.07.25 04:47, Michael Paquier wrote:
> I understand that from the point of view of a
> maintainer this is rather bad, but from the customer point of view the
> current situation is also bad to deal with in the scope of a minor
> upgrade, because applications suddenly break.
I totally get it --- from the user’s perspective, it’s hard to see this
as a bugfix.
I was wondering whether using XML_PARSE_HUGE in xml_parse's options
could help address this, for example:
options = XML_PARSE_NOENT | XML_PARSE_DTDATTR | XML_PARSE_HUGE
| (preserve_whitespace ? 0 : XML_PARSE_NOBLANKS);
According to libxml2's parserInternals.h:
/**
* Maximum size allowed for a single text node when building a tree.
* This is not a limitation of the parser but a safety boundary feature,
* use XML_PARSE_HUGE option to override it.
* Introduced in 2.9.0
*/
#define XML_MAX_TEXT_LENGTH 10000000
/**
* Maximum size allowed when XML_PARSE_HUGE is set.
*/
#define XML_MAX_HUGE_LENGTH 1000000000
The XML_MAX_TEXT_LENGTH limit is what we're hitting now, but
XML_MAX_HUGE_LENGTH is extremely generous. Here's a quick PoC using
XML_PARSE_HUGE:
psql (19devel)
Type "help" for help.
postgres=# CREATE TABLE xmldata (message xml);
CREATE TABLE
postgres=# DO $$
DECLARE huge_size text := repeat('X', 1000000000);
BEGIN
INSERT INTO xmldata (message) VALUES
((('<foo><bar>' || huge_size ||'</bar></foo>')::xml));
END $$;
DO
postgres=# SELECT pg_size_pretty(length(message::text)::bigint) FROM
xmldata;
pg_size_pretty
----------------
954 MB
(1 row)
While XML_MAX_HUGE_LENGTH prevents unlimited memory usage, it still
opens the door to potential resource exhaustion. I couldn't find a way
to dynamically adjust this limit in libxml2.
One idea would be to guard XML_PARSE_HUGE behind a GUC --- say,
xml_enable_huge_parsing. That would at least allow controlled
environments to opt in. But of course, that wouldn't help current releases.
Best regards, Jim
From | Date | Subject | |
---|---|---|---|
Next Message | Mircea Cadariu | 2025-07-28 07:52:16 | Re: Add os_page_num to pg_buffercache |
Previous Message | Andrei Lepikhov | 2025-07-28 07:11:36 | Re: track generic and custom plans in pg_stat_statements |