From: | Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Erik Wienhold <ewie(at)ewie(dot)name> |
Subject: | Re: Regression with large XML data input |
Date: | 2025-07-24 23:25:48 |
Message-ID: | 7a9a9804-01d8-4489-a831-1a222bae6705@uni-muenster.de |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 24.07.25 21:23, Tom Lane wrote:
> Oh, wait ... the plot thickens! The above statement is true
> when testing on my Mac with libxml2 2.13.8 from MacPorts.
> With either HEAD or f68d6aabb7e2^, I get errors similar to
> what Erik just showed:
>
> ERROR: invalid XML content
> DETAIL: line 1: Resource limit exceeded: Text node too long, try XML_PARSE_HUGE
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
I get the same error with libxml2 2.9.14 on Ubuntu.
> However, when testing on RHEL8 with libxml2 2.9.7, indeed
> I get "Huge input lookup" with our current code but no
> failure with f68d6aabb7e2^.
>
> The way I interpret these results is that in older libxml2 versions,
> xmlParseBalancedChunkMemory is missing an XML_ERR_RESOURCE_LIMIT check
> that does exist in newer versions. So even if we were to do some kind
> of reversion, it would only prevent the error in libxml2 versions that
> lack that check. And in those versions we'd probably be exposing
> ourselves to resource-exhaustion problems.
>
> On the whole I'm thinking more and more that we don't want to
> touch this. Our recommendation for processing multi-megabyte
> chunks of XML should be "don't". Unless we want to find or
> write a replacement for libxml2 ... which we have discussed,
> but so far nothing's happened.
I also believe that addressing this limitation may not be worth the
associated risks. Moreover, a 10MB text node is rather large and
probably exceeds the needs of most users.
Best, Jim
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2025-07-24 23:52:22 | Re: index prefetching |
Previous Message | Bruce Momjian | 2025-07-24 22:22:12 | Re: PG 18 beta1 release notes misses mention of pg_noreturn |