Re: Regression with large XML data input

From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Treat <rob(at)xzilla(dot)net>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Erik Wienhold <ewie(at)ewie(dot)name>
Subject: Re: Regression with large XML data input
Date: 2025-07-28 07:45:14
Message-ID: cc0bd778-9730-4ef9-98b3-a965f8895331@uni-muenster.de
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 28.07.25 04:47, Michael Paquier wrote:
> I understand that from the point of view of a
> maintainer this is rather bad, but from the customer point of view the
> current situation is also bad to deal with in the scope of a minor
> upgrade, because applications suddenly break.

I totally get it --- from the user’s perspective, it’s hard to see this
as a bugfix.

I was wondering whether using XML_PARSE_HUGE in xml_parse's options
could help address this, for example:

options = XML_PARSE_NOENT | XML_PARSE_DTDATTR | XML_PARSE_HUGE
          | (preserve_whitespace ? 0 : XML_PARSE_NOBLANKS);

According to libxml2's parserInternals.h:

/**
 * Maximum size allowed for a single text node when building a tree.
 * This is not a limitation of the parser but a safety boundary feature,
 * use XML_PARSE_HUGE option to override it.
 * Introduced in 2.9.0
 */
#define XML_MAX_TEXT_LENGTH 10000000

/**
 * Maximum size allowed when XML_PARSE_HUGE is set.
 */
#define XML_MAX_HUGE_LENGTH 1000000000

The XML_MAX_TEXT_LENGTH limit is what we're hitting now, but
XML_MAX_HUGE_LENGTH is extremely generous. Here's a quick PoC using
XML_PARSE_HUGE:

psql (19devel)
Type "help" for help.

postgres=# CREATE TABLE xmldata (message xml);
CREATE TABLE
postgres=# DO $$
DECLARE huge_size text := repeat('X', 1000000000);
BEGIN
  INSERT INTO xmldata (message) VALUES
  ((('<foo><bar>' || huge_size ||'</bar></foo>')::xml));
END $$;
DO
postgres=# SELECT pg_size_pretty(length(message::text)::bigint) FROM
xmldata;
 pg_size_pretty
----------------
 954 MB
(1 row)

While XML_MAX_HUGE_LENGTH prevents unlimited memory usage, it still
opens the door to potential resource exhaustion. I couldn't find a way
to dynamically adjust this limit in libxml2.

One idea would be to guard XML_PARSE_HUGE behind a GUC --- say,
xml_enable_huge_parsing. That would at least allow controlled
environments to opt in. But of course, that wouldn't help current releases.

Best regards, Jim

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mircea Cadariu 2025-07-28 07:52:16 Re: Add os_page_num to pg_buffercache
Previous Message Andrei Lepikhov 2025-07-28 07:11:36 Re: track generic and custom plans in pg_stat_statements