Re: Regression with large XML data input

From: Erik Wienhold <ewie(at)ewie(dot)name>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Treat <rob(at)xzilla(dot)net>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Regression with large XML data input
Date: 2025-07-28 10:49:02
Message-ID: 457dd309-158f-43cd-81d4-df7284f30d4f@ewie.name
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2025-07-28 09:45 +0200, Jim Jones wrote:
>
> On 28.07.25 04:47, Michael Paquier wrote:
> > I understand that from the point of view of a maintainer this is
> > rather bad, but from the customer point of view the current
> > situation is also bad to deal with in the scope of a minor upgrade,
> > because applications suddenly break.
>
> I totally get it --- from the user’s perspective, it’s hard to see
> this as a bugfix.
>
> I was wondering whether using XML_PARSE_HUGE in xml_parse's options
> could help address this, for example:
>
> options = XML_PARSE_NOENT | XML_PARSE_DTDATTR | XML_PARSE_HUGE
>           | (preserve_whitespace ? 0 : XML_PARSE_NOBLANKS);

This also came to my mind, but it was already tried and reverted soon
after for security reasons. [1]

> One idea would be to guard XML_PARSE_HUGE behind a GUC --- say,
> xml_enable_huge_parsing. That would at least allow controlled
> environments to opt in. But of course, that wouldn't help current
> releases.

+1 for new major releases. But normal users must not be allowed to
enable that GUC. So probably context PGC_SU_BACKEND.

I'm leaning towards Michael's proposal of adding a libxml2 version check
in the stable branches before REL_18_STABLE and parsing the content with
xmlParseBalancedChunkMemory on versions up to 2.12.x.

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f2743a7d70e7b2891277632121bb51e739743a47

--
Erik Wienhold

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2025-07-28 11:08:45 Re: Conflict detection for update_deleted in logical replication
Previous Message Vik Fearing 2025-07-28 10:47:47 Re: implement CAST(expr AS type FORMAT 'template')