Re: Regression with large XML data input

From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Erik Wienhold <ewie(at)ewie(dot)name>
Subject: Re: Regression with large XML data input
Date: 2025-07-24 23:25:48
Message-ID: 7a9a9804-01d8-4489-a831-1a222bae6705@uni-muenster.de
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 24.07.25 21:23, Tom Lane wrote:
> Oh, wait ... the plot thickens! The above statement is true
> when testing on my Mac with libxml2 2.13.8 from MacPorts.
> With either HEAD or f68d6aabb7e2^, I get errors similar to
> what Erik just showed:
>
> ERROR: invalid XML content
> DETAIL: line 1: Resource limit exceeded: Text node too long, try XML_PARSE_HUGE
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

I get the same error with libxml2 2.9.14 on Ubuntu.

> However, when testing on RHEL8 with libxml2 2.9.7, indeed
> I get "Huge input lookup" with our current code but no
> failure with f68d6aabb7e2^.
>
> The way I interpret these results is that in older libxml2 versions,
> xmlParseBalancedChunkMemory is missing an XML_ERR_RESOURCE_LIMIT check
> that does exist in newer versions. So even if we were to do some kind
> of reversion, it would only prevent the error in libxml2 versions that
> lack that check. And in those versions we'd probably be exposing
> ourselves to resource-exhaustion problems.
>
> On the whole I'm thinking more and more that we don't want to
> touch this. Our recommendation for processing multi-megabyte
> chunks of XML should be "don't". Unless we want to find or
> write a replacement for libxml2 ... which we have discussed,
> but so far nothing's happened.

I also believe that addressing this limitation may not be worth the
associated risks. Moreover, a 10MB text node is rather large and
probably exceeds the needs of most users.

Best, Jim

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2025-07-24 23:52:22 Re: index prefetching
Previous Message Bruce Momjian 2025-07-24 22:22:12 Re: PG 18 beta1 release notes misses mention of pg_noreturn