Add GUC to enable libxml2's XML_PARSE_HUGE

From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Erik Wienhold <ewie(at)ewie(dot)name>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>
Subject: Add GUC to enable libxml2's XML_PARSE_HUGE
Date: 2025-08-20 15:37:50
Message-ID: 074d9029-45df-4bed-b3c7-58981bd4b545@uni-muenster.de
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

In commit 71c0921 we re-introduced use of xmlParseBalancedChunkMemory in
order to allow parsing of large XML documents with certain libxml2
versions [1]. While that solved a regression issue, it still leaves the
handling of very large or deeply nested XML documents tied to libxml2’s
internal limits and behaviuor.

To address this, Erik and I would like to propose a new GUC,
xml_parse_huge, which controls libxml2’s XML_PARSE_HUGE option. This
makes the handling of large XML documents explicit and independent of
libxml2 version quirks. The new predefined role pg_xml_parse_huge allows
superusers to grant session-level use of this option without granting
full superuser rights, so DBAs can flexibly delegate the capability in a
controlled manner.

Examples:

$ /usr/local/postgres-dev/bin/psql postgres
psql (19devel)
Type "help" for help.

postgres=# CREATE USER u1;
CREATE ROLE
postgres=# CREATE DATABASE db OWNER u1;
CREATE DATABASE
postgres=# \q

# By default a user cannot set this parameter and the default value is 'off'

$ /usr/local/postgres-dev/bin/psql -d db -U u1
psql (19devel)
Type "help" for help.

db=> SHOW xml_parse_huge;
 xml_parse_huge
----------------
 off
(1 row)

db=> SET xml_parse_huge TO on;
ERROR:  permission denied to set parameter "xml_parse_huge"
HINT:  You must be a superuser or a member of the "pg_xml_parse_huge"
role to set this option.

db=> ALTER SYSTEM SET xml_parse_huge TO on;
ERROR:  permission denied to set parameter "xml_parse_huge"

# This leads libxml2 to raise an error for text nodes exceeding
XML_MAX_TEXT_LENGTH

db=> CREATE TABLE t1 AS SELECT ('<root>' || repeat('X',10000001) ||
'</root>')::xml;
ERROR:  invalid XML content
DETAIL:  line 1: Resource limit exceeded: Text node too long, try
XML_PARSE_HUGE
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

# The role pg_xml_parse_huge allows the user to set the new parameter

$ /usr/local/postgres-dev/bin/psql postgres
psql (19devel)
Type "help" for help.

postgres=# GRANT pg_xml_parse_huge TO u1;
GRANT ROLE
postgres=# \q

$ /usr/local/postgres-dev/bin/psql -d db -U u1
psql (19devel)
Type "help" for help.

db=> SET xml_parse_huge TO on;
SET
db=> CREATE TABLE t1 AS SELECT ('<root>' || repeat('X',10000001) ||
'</root>')::xml;
SELECT 1

# It is also possible to enable this feature by default for a user

$ /usr/local/postgres-dev/bin/psql postgres
psql (19devel)
Type "help" for help.

postgres=# CREATE USER u2;
CREATE ROLE
postgres=# GRANT pg_xml_parse_huge TO u2;
GRANT ROLE
postgres=# ALTER USER u2 SET xml_parse_huge TO on;
ALTER ROLE
postgres=# \q

$ /usr/local/postgres-dev/bin/psql -d db -U u2
psql (19devel)
Type "help" for help.

db=> SHOW xml_parse_huge ;
 xml_parse_huge
----------------
 on
(1 row)

# A superuser can enable this feature for a whole database (or the whole
cluster via postgresql.conf):

$ /usr/local/postgres-dev/bin/psql postgres
psql (19devel)
Type "help" for help.

postgres=# CREATE DATABASE db2;
CREATE DATABASE
postgres=# ALTER DATABASE db2 SET xml_parse_huge TO on;
ALTER DATABASE
postgres=# SHOW xml_parse_huge ;
 xml_parse_huge
----------------
 off
(1 row)

postgres=# \c db2
You are now connected to database "db2" as user "jim".
db2=# SHOW xml_parse_huge ;
 xml_parse_huge
----------------
 on
(1 row)

Attached is a first draft.

* I'm CC'ing Tom and Michael since they were involved in the earlier
discussion.

Initially we considered creating a second GUC instead of a role, but
decided that would be confusing and less manageable than having a single
GUC with role-based delegation.

Any thoughts or comments?

[1]
https://www.postgresql.org/message-id/flat/a8771e75-60ee-4c99-ae10-ca4832e1ec8d%40uni-muenster.de#1cfece11b1d62fbd43ed644e1f9710e2

Best regards, Jim

Attachment Content-Type Size
v1-0001-Add-GUC-to-enable-libxml2-s-XML_PARSE_HUGE-option.patch text/x-patch 19.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-08-20 15:46:11 Re: Add GUC to enable libxml2's XML_PARSE_HUGE
Previous Message Tomas Vondra 2025-08-20 14:37:33 Re: Changing the state of data checksums in a running cluster