From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com> |
Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 |
Date: | 2025-05-21 05:48:24 |
Message-ID: | CAA4eK1JwJw6JOnfDxtGtSRF7kM0LbEVPRmNxWeJa5+wyoG05Xg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Mon, May 19, 2025 at 8:08 PM Duncan Sands
<duncan(dot)sands(at)deepbluecap(dot)com> wrote:
>
> PostgreSQL v17.5 (Ubuntu 17.5-1.pgdg24.04+1); Ubuntu 24.04.2 LTS (kernel
> 6.8.0); x86-64
>
> Good morning from DeepBlueCapital. Soon after upgrading to 17.5 from 17.4, we
> started seeing logical replication failures with publisher errors like this:
>
> ERROR: invalid memory alloc request size 1196493216
>
> (the exact size varies). Here is a typical log extract from the publisher:
>
> 2025-05-19 10:30:14 CEST \[1348336-465] remote\_production\_user\(at)blue DEBUG:
> 00000: write FB03/349DEF90 flush FB03/349DEF90 apply FB03/349DEF90 reply\_time
> 2025-05-19 10:30:07.467048+02
> 2025-05-19 10:30:14 CEST \[1348336-466] remote\_production\_user\(at)blue LOCATION:
> ProcessStandbyReplyMessage, walsender.c:2431
> 2025-05-19 10:30:14 CEST \[1348336-467] remote\_production\_user\(at)blue DEBUG:
> 00000: skipped replication of an empty transaction with XID: 207637565
> 2025-05-19 10:30:14 CEST \[1348336-468] remote\_production\_user\(at)blue CONTEXT:
> slot "jnb\_production", output plugin "pgoutput", in the commit callback,
> associated LSN FB03/349FF938
> 2025-05-19 10:30:14 CEST \[1348336-469] remote\_production\_user\(at)blue LOCATION:
> pgoutput\_commit\_txn, pgoutput.c:629
> 2025-05-19 10:30:14 CEST \[1348336-470] remote\_production\_user\(at)blue DEBUG:
> 00000: UpdateDecodingStats: updating stats 0x5ae1616c17a8 0 0 0 0 1 0 1 191
> 2025-05-19 10:30:14 CEST \[1348336-471] remote\_production\_user\(at)blue LOCATION:
> UpdateDecodingStats, logical.c:1943
> 2025-05-19 10:30:14 CEST \[1348336-472] remote\_production\_user\(at)blue DEBUG:
> 00000: found top level transaction 207637519, with catalog changes
> 2025-05-19 10:30:14 CEST \[1348336-473] remote\_production\_user\(at)blue LOCATION:
> SnapBuildCommitTxn, snapbuild.c:1150
> 2025-05-19 10:30:14 CEST \[1348336-474] remote\_production\_user\(at)blue DEBUG:
> 00000: adding a new snapshot and invalidations to 207616976 at FB03/34A1AAE0
> 2025-05-19 10:30:14 CEST \[1348336-475] remote\_production\_user\(at)blue LOCATION:
> SnapBuildDistributeSnapshotAndInval, snapbuild.c:915
> 2025-05-19 10:30:14 CEST \[1348336-476] remote\_production\_user\(at)blue ERROR:
> XX000: invalid memory alloc request size 1196493216
>
> If I'm reading it right, things go wrong on the publisher while preparing the
> message, i.e. it's not a subscriber problem.
>
Right, I also think so.
> This particular instance was triggered by a large number of catalog
> invalidations: I dumped what I think is the relevant WAL with "pg_waldump -s
> FB03/34A1AAE0 -p 17/main/ --xid=207637519" and the output was a single long line:
>
...
...
>
> While it is long, it doesn't seem to merit allocating anything like 1GB of
> memory. So I'm guessing that postgres is miscalculating the required size somehow.
>
We fixed a bug in commit 4909b38af0 to distribute invalidation at the
transaction end to avoid data loss in certain cases, which could cause
such a problem. I am wondering that even prior to that commit, we
would eventually end up allocating the required memory for a
transaction for all the invalidations because of repalloc in
ReorderBufferAddInvalidations, so why it matter with this commit? One
possibility is that we need allocations for multiple in-progress
transactions now. I'll think more about this. It would be helpful if
you could share more details about the workload, or if possible, a
testcase or script using which we can reproduce this problem.
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Laurenz Albe | 2025-05-21 06:17:53 | Re: BUG #18936: Trigger enable users to modify the tables which he doesn't have privilege |
Previous Message | Shlok Kyal | 2025-05-21 05:46:15 | Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 |