Re: Regression with large XML data input

From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Erik Wienhold <ewie(at)ewie(dot)name>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Robert Treat <rob(at)xzilla(dot)net>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Regression with large XML data input
Date: 2025-07-29 10:15:28
Message-ID: 93116cb2-e3e2-40bf-be9b-77a2f68fd10a@uni-muenster.de
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 28.07.25 22:16, Tom Lane wrote:
> Erik's v2 is slightly wrong as to the save-and-restore logic for
> the KeepBlanks setting: we need to restore in the error path too,
> and we'd better mark the save variable volatile since it's modified
> inside the PG_TRY. I made some other cosmetic changes, mainly to
> avoid calculating "options" when it won't be used. I tested the
> attached v3 against RHEL8's libxml2-2.9.7, as well as against today's
> libxml2 git master, and it accepts the problematic input on both.

Out of curiosity, what's the reasoning behind keeping node_list instead
of directly using parsed_nodes in the xmlParseBalancedChunkMemory call?

Example:

if (*(utf8string + count))
{
    res_code = xmlParseBalancedChunkMemory(doc, NULL, NULL, 0,
                                           utf8string + count,
                                           parsed_nodes);
    if (res_code != 0 || xmlerrcxt->err_occurred)
    {
        xml_errsave(escontext, xmlerrcxt,
                    ERRCODE_INVALID_XML_CONTENT,
                    "invalid XML content");
        goto fail;
    }
}

I was also wondering if we should add to PG 19 a GUC to enable
XML_MAX_HUGE_LENGTH if so needed. If we go down that route, we'd likely
need to revisit xmlParseBalancedChunkMemory (again!) since it appears to
be hardcoded to XML_MAX_TEXT_LENGTH. Any thoughts?

Best regards, Jim

Attachment Content-Type Size
remove-node_list.diff text/x-patch 866 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sergei Kornilov 2025-07-29 10:15:30 Re: pg_basebackup and pg_switch_wal()
Previous Message Richard Guo 2025-07-29 09:48:29 Re: Proposal: QUALIFY clause