Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5

From: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
To: Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date: 2025-05-21 05:46:15
Message-ID: CANhcyEWp_T7tX-yKbdbxdUR144UAZ7oxNM_AORfCvWHZg0ja5w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, 19 May 2025 at 20:08, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com> wrote:
>
> PostgreSQL v17.5 (Ubuntu 17.5-1.pgdg24.04+1); Ubuntu 24.04.2 LTS (kernel
> 6.8.0); x86-64
>
> Good morning from DeepBlueCapital. Soon after upgrading to 17.5 from 17.4, we
> started seeing logical replication failures with publisher errors like this:
>
> ERROR: invalid memory alloc request size 1196493216
>
> (the exact size varies). Here is a typical log extract from the publisher:
>
> 2025-05-19 10:30:14 CEST \[1348336-465] remote\_production\_user\(at)blue DEBUG:
> 00000: write FB03/349DEF90 flush FB03/349DEF90 apply FB03/349DEF90 reply\_time
> 2025-05-19 10:30:07.467048+02
> 2025-05-19 10:30:14 CEST \[1348336-466] remote\_production\_user\(at)blue LOCATION:
> ProcessStandbyReplyMessage, walsender.c:2431
> 2025-05-19 10:30:14 CEST \[1348336-467] remote\_production\_user\(at)blue DEBUG:
> 00000: skipped replication of an empty transaction with XID: 207637565
> 2025-05-19 10:30:14 CEST \[1348336-468] remote\_production\_user\(at)blue CONTEXT:
> slot "jnb\_production", output plugin "pgoutput", in the commit callback,
> associated LSN FB03/349FF938
> 2025-05-19 10:30:14 CEST \[1348336-469] remote\_production\_user\(at)blue LOCATION:
> pgoutput\_commit\_txn, pgoutput.c:629
> 2025-05-19 10:30:14 CEST \[1348336-470] remote\_production\_user\(at)blue DEBUG:
> 00000: UpdateDecodingStats: updating stats 0x5ae1616c17a8 0 0 0 0 1 0 1 191
> 2025-05-19 10:30:14 CEST \[1348336-471] remote\_production\_user\(at)blue LOCATION:
> UpdateDecodingStats, logical.c:1943
> 2025-05-19 10:30:14 CEST \[1348336-472] remote\_production\_user\(at)blue DEBUG:
> 00000: found top level transaction 207637519, with catalog changes
> 2025-05-19 10:30:14 CEST \[1348336-473] remote\_production\_user\(at)blue LOCATION:
> SnapBuildCommitTxn, snapbuild.c:1150
> 2025-05-19 10:30:14 CEST \[1348336-474] remote\_production\_user\(at)blue DEBUG:
> 00000: adding a new snapshot and invalidations to 207616976 at FB03/34A1AAE0
> 2025-05-19 10:30:14 CEST \[1348336-475] remote\_production\_user\(at)blue LOCATION:
> SnapBuildDistributeSnapshotAndInval, snapbuild.c:915
> 2025-05-19 10:30:14 CEST \[1348336-476] remote\_production\_user\(at)blue ERROR:
> XX000: invalid memory alloc request size 1196493216
>
> If I'm reading it right, things go wrong on the publisher while preparing the
> message, i.e. it's not a subscriber problem.
>
> This particular instance was triggered by a large number of catalog
> invalidations: I dumped what I think is the relevant WAL with "pg_waldump -s
> FB03/34A1AAE0 -p 17/main/ --xid=207637519" and the output was a single long line:
>
> rmgr: Transaction len (rec/tot): 10665/ 10665, tx: 207637519, lsn:
> FB03/34A1AAE0, prev FB03/34A1A8C8, desc: COMMIT 2025-05-19 08:10:12.880599 CEST;
> dropped stats: 2/17426/661557718 2/17426/661557717 2/17426/661557714
> 2/17426/661557678 2/17426/661557677 2/17426/661557674 2/17426/661557673
> 2/17426/661557672 2/17426/661557669 2/17426/661557618 2/17426/661557617
> 2/17426/661557614; inval msgs: catcache 80 catcache 79 catcache 80 catcache 79
> catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55
> catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 32 catcache 55
> catcache 54 catcache 55 catcache 54 catcache 55 catcache 54 catcache 80 catcache
> 79 catcache 80 catcache 79 catcache 55 catcache 54 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 32 catcache 55 catcache 54 catcache 55 catcache 54 catcache
> 55 catcache 54 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63
> catcache 63 catcache 63 catcache 55 catcache 54 catcache 80 catcache 79 catcache
> 80 catcache 79 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache
> 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 32 catcache 55 catcache 54 catcache 55 catcache 54 catcache 55 catcache
> 54 catcache 80 catcache 79 catcache 80 catcache 79 catcache 55 catcache 54
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55
> catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 32 catcache 55 catcache 54
> catcache 55 catcache 54 catcache 55 catcache 54 catcache 63 catcache 63 catcache
> 63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63
> catcache 63 catcache 63 catcache 55 catcache 54 catcache 32 catcache 7 catcache
> 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 55 catcache 54 catcache 80 catcache 79 catcache 80 catcache
> 79 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63
> catcache 63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 7 catcache
> 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 32
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 80 catcache 79
> catcache 80 catcache 79 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 55 catcache 54 catcache 32 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 55 catcache 54 catcache 80 catcache 79 catcache 80 catcache 79 catcache
> 63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63 catcache 63
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54
> catcache 32 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 80
> catcache 79 catcache 80 catcache 79 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6
> catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
> catcache 6 catcache 55 catcache 54 snapshot 2608 relcache 661557614 snapshot
> 1214 relcache 661557617 relcache 661557618 relcache 661557617 snapshot 2608
> relcache 661557617 relcache 661557618 relcache 661557614 snapshot 2608 snapshot
> 2608 relcache 661557669 snapshot 1214 relcache 661557672 relcache 661557673
> relcache 661557672 snapshot 2608 relcache 661557672 relcache 661557673 relcache
> 661557669 snapshot 2608 relcache 661557669 snapshot 2608 relcache 661557674
> snapshot 1214 relcache 661557677 relcache 661557678 relcache 661557677 snapshot
> 2608 relcache 661557677 relcache 661557678 relcache 661557674 snapshot 2608
> snapshot 2608 relcache 661557714 snapshot 1214 relcache 661557717 relcache
> 661557718 relcache 661557717 snapshot 2608 relcache 661557717 relcache 661557718
> relcache 661557714 snapshot 2608 relcache 661557714 relcache 661557718 relcache
> 661557717 snapshot 2608 relcache 661557717 snapshot 2608 snapshot 2608 snapshot
> 2608 relcache 661557714 snapshot 2608 snapshot 1214 relcache 661557678 relcache
> 661557677 snapshot 2608 relcache 661557677 snapshot 2608 snapshot 2608 snapshot
> 2608 relcache 661557674 snapshot 2608 snapshot 1214 relcache 661557673 relcache
> 661557672 snapshot 2608 relcache 661557672 snapshot 2608 snapshot 2608 snapshot
> 2608 relcache 661557669 snapshot 2608 snapshot 1214 relcache 661557618 relcache
> 661557617 snapshot 2608 relcache 661557617 snapshot 2608 snapshot 2608 snapshot
> 2608 relcache 661557614 snapshot 2608 snapshot 1214
>
> While it is long, it doesn't seem to merit allocating anything like 1GB of
> memory. So I'm guessing that postgres is miscalculating the required size somehow.
>
> If I skip over this LSN, for example by dropping the subscription and recreating
> it anew, then things go fine for a while before hitting another "invalid memory
> alloc request", i.e. it wasn't just a one-off. On the other hand, after
> downgrading to 17.4, subscribers spontaneously recovered and the issue has gone
> way. Since I didn't skip over the last LSN of this kind, presumably 17.4
> successfully serialized a message for the same problematic bit of WAL that
> caused 17.5 to blow up, which suggests a regression between 17.4 and 17.5.
>
Hi Duncan,

Thanks for reporting this.
I tried adding around ~80000 invalidations but could not reproduce the issue.
Can you share the steps to reproduce the above scenario?

Thanks and Regards,
Shlok Kyal

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Amit Kapila 2025-05-21 05:48:24 Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Previous Message Nathan Bossart 2025-05-20 21:33:03 Re: BUG #18923: pg_dump 18beta1 fails to process complex table names