| From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> | 
|---|---|
| To: | Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com> | 
| Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org | 
| Subject: | Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 | 
| Date: | 2025-05-21 05:48:24 | 
| Message-ID: | CAA4eK1JwJw6JOnfDxtGtSRF7kM0LbEVPRmNxWeJa5+wyoG05Xg@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-bugs | 
On Mon, May 19, 2025 at 8:08 PM Duncan Sands
<duncan(dot)sands(at)deepbluecap(dot)com> wrote:
>
> PostgreSQL v17.5 (Ubuntu 17.5-1.pgdg24.04+1); Ubuntu 24.04.2 LTS (kernel
> 6.8.0); x86-64
>
> Good morning from DeepBlueCapital.  Soon after upgrading to 17.5 from 17.4, we
> started seeing logical replication failures with publisher errors like this:
>
>    ERROR:  invalid memory alloc request size 1196493216
>
> (the exact size varies).  Here is a typical log extract from the publisher:
>
> 2025-05-19 10:30:14 CEST \[1348336-465] remote\_production\_user\(at)blue DEBUG:
> 00000: write FB03/349DEF90 flush FB03/349DEF90 apply FB03/349DEF90 reply\_time
> 2025-05-19 10:30:07.467048+02
> 2025-05-19 10:30:14 CEST \[1348336-466] remote\_production\_user\(at)blue LOCATION:
>   ProcessStandbyReplyMessage, walsender.c:2431
> 2025-05-19 10:30:14 CEST \[1348336-467] remote\_production\_user\(at)blue DEBUG:
> 00000: skipped replication of an empty transaction with XID: 207637565
> 2025-05-19 10:30:14 CEST \[1348336-468] remote\_production\_user\(at)blue CONTEXT:
> slot "jnb\_production", output plugin "pgoutput", in the commit callback,
> associated LSN FB03/349FF938
> 2025-05-19 10:30:14 CEST \[1348336-469] remote\_production\_user\(at)blue LOCATION:
>   pgoutput\_commit\_txn, pgoutput.c:629
> 2025-05-19 10:30:14 CEST \[1348336-470] remote\_production\_user\(at)blue DEBUG:
> 00000: UpdateDecodingStats: updating stats 0x5ae1616c17a8 0 0 0 0 1 0 1 191
> 2025-05-19 10:30:14 CEST \[1348336-471] remote\_production\_user\(at)blue LOCATION:
>   UpdateDecodingStats, logical.c:1943
> 2025-05-19 10:30:14 CEST \[1348336-472] remote\_production\_user\(at)blue DEBUG:
> 00000: found top level transaction 207637519, with catalog changes
> 2025-05-19 10:30:14 CEST \[1348336-473] remote\_production\_user\(at)blue LOCATION:
>   SnapBuildCommitTxn, snapbuild.c:1150
> 2025-05-19 10:30:14 CEST \[1348336-474] remote\_production\_user\(at)blue DEBUG:
> 00000: adding a new snapshot and invalidations to 207616976 at FB03/34A1AAE0
> 2025-05-19 10:30:14 CEST \[1348336-475] remote\_production\_user\(at)blue LOCATION:
>   SnapBuildDistributeSnapshotAndInval, snapbuild.c:915
> 2025-05-19 10:30:14 CEST \[1348336-476] remote\_production\_user\(at)blue ERROR:
> XX000: invalid memory alloc request size 1196493216
>
> If I'm reading it right, things go wrong on the publisher while preparing the
> message, i.e. it's not a subscriber problem.
>
Right, I also think so.
> This particular instance was triggered by a large number of catalog
> invalidations: I dumped what I think is the relevant WAL with "pg_waldump -s
> FB03/34A1AAE0 -p 17/main/ --xid=207637519" and the output was a single long line:
>
...
...
>
> While it is long, it doesn't seem to merit allocating anything like 1GB of
> memory.  So I'm guessing that postgres is miscalculating the required size somehow.
>
We fixed a bug in commit 4909b38af0 to distribute invalidation at the
transaction end to avoid data loss in certain cases, which could cause
such a problem. I am wondering that even prior to that commit, we
would eventually end up allocating the required memory for a
transaction for all the invalidations because of repalloc in
ReorderBufferAddInvalidations, so why it matter with this commit? One
possibility is that we need allocations for multiple in-progress
transactions now. I'll think more about this. It would be helpful if
you could share more details about the workload, or if possible, a
testcase or script using which we can reproduce this problem.
-- 
With Regards,
Amit Kapila.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Laurenz Albe | 2025-05-21 06:17:53 | Re: BUG #18936: Trigger enable users to modify the tables which he doesn't have privilege | 
| Previous Message | Shlok Kyal | 2025-05-21 05:46:15 | Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 |