Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5

From: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date: 2025-05-22 12:23:47
Message-ID: CANhcyEW8UyMr_7idB580DT3bjtB=EKiHwecTx5KC3ggiVs9c+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, 21 May 2025 at 17:18, Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear hackers,
>
> > I think the problem here is that when we are distributing
> > invalidations to a concurrent transaction, in addition to queuing the
> > invalidations as a change, we also copy the distributed invalidations
> > along with the original transaction's invalidations via repalloc in
> > ReorderBufferAddInvalidations. So, when there are many in-progress
> > transactions, each would try to copy all its accumulated invalidations
> > to the remaining in-progress transactions. This could lead to such an
> > increase in allocation request size. However, after queuing the
> > change, we don't need to copy it along with the original transaction's
> > invalidations. This is because the copy is only required when we don't
> > process any changes in cases like ReorderBufferForget(). I have
> > analyzed all such cases, and my analysis is as follows:
>
> Based on the analysis, I created a PoC which avoids the repalloc().
> Invalidation messages distributed by SnapBuildDistributeSnapshotAndInval() are
> skipped to add in the list, just queued - repalloc can be skipped. Also, the function
> distributes messages only in the list, so received messages won't be sent again.
>
> Now a patch for PG17 is created for testing purpose. Duncan, can you apply this and
> confirms whether the issue can be solved?
>
Hi,

I was able to reproduce the issue with following test:

1. First begin 9 concurrent txn. (BEGIN; INSERT into t1 values(11);)
2. In 10th concurrent txn : perform 1000 DDL (ALTER PUBLICATION ADD/DROP TABLE)
3. For each concurrent 9 txn. Perform:
i. Add 1000 DDL
ii. COMMIT;
iii. BEGIN; INSERT into t1 values(11);
4. Perform step (2 and 3) in loop

This steps reproduced the error:
2025-05-22 19:03:35.111 JST [63150] sub1 ERROR: invalid memory alloc
request size 1555752832
2025-05-22 19:03:35.111 JST [63150] sub1 STATEMENT: START_REPLICATION
SLOT "sub1" LOGICAL 0/0 (proto_version '4', streaming 'parallel',
origin 'any', publication_names '"pub1"')

I have also attached the test script for the same.
Also, I tried to run the test with Kuroda-san's patch and it did not
reproduce the issue.

Thanks and Regards,
Shlok Kyal

Attachment Content-Type Size
036_test.pl application/octet-stream 6.1 KB

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2025-05-22 12:59:52 RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Previous Message Amit Kapila 2025-05-22 10:56:55 Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5