From: | Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> |
---|---|
To: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com> |
Subject: | Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 |
Date: | 2025-05-22 12:23:47 |
Message-ID: | CANhcyEW8UyMr_7idB580DT3bjtB=EKiHwecTx5KC3ggiVs9c+A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Wed, 21 May 2025 at 17:18, Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear hackers,
>
> > I think the problem here is that when we are distributing
> > invalidations to a concurrent transaction, in addition to queuing the
> > invalidations as a change, we also copy the distributed invalidations
> > along with the original transaction's invalidations via repalloc in
> > ReorderBufferAddInvalidations. So, when there are many in-progress
> > transactions, each would try to copy all its accumulated invalidations
> > to the remaining in-progress transactions. This could lead to such an
> > increase in allocation request size. However, after queuing the
> > change, we don't need to copy it along with the original transaction's
> > invalidations. This is because the copy is only required when we don't
> > process any changes in cases like ReorderBufferForget(). I have
> > analyzed all such cases, and my analysis is as follows:
>
> Based on the analysis, I created a PoC which avoids the repalloc().
> Invalidation messages distributed by SnapBuildDistributeSnapshotAndInval() are
> skipped to add in the list, just queued - repalloc can be skipped. Also, the function
> distributes messages only in the list, so received messages won't be sent again.
>
> Now a patch for PG17 is created for testing purpose. Duncan, can you apply this and
> confirms whether the issue can be solved?
>
Hi,
I was able to reproduce the issue with following test:
1. First begin 9 concurrent txn. (BEGIN; INSERT into t1 values(11);)
2. In 10th concurrent txn : perform 1000 DDL (ALTER PUBLICATION ADD/DROP TABLE)
3. For each concurrent 9 txn. Perform:
i. Add 1000 DDL
ii. COMMIT;
iii. BEGIN; INSERT into t1 values(11);
4. Perform step (2 and 3) in loop
This steps reproduced the error:
2025-05-22 19:03:35.111 JST [63150] sub1 ERROR: invalid memory alloc
request size 1555752832
2025-05-22 19:03:35.111 JST [63150] sub1 STATEMENT: START_REPLICATION
SLOT "sub1" LOGICAL 0/0 (proto_version '4', streaming 'parallel',
origin 'any', publication_names '"pub1"')
I have also attached the test script for the same.
Also, I tried to run the test with Kuroda-san's patch and it did not
reproduce the issue.
Thanks and Regards,
Shlok Kyal
Attachment | Content-Type | Size |
---|---|---|
036_test.pl | application/octet-stream | 6.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Hayato Kuroda (Fujitsu) | 2025-05-22 12:59:52 | RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 |
Previous Message | Amit Kapila | 2025-05-22 10:56:55 | Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 |