Quick Links

Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-05-23 17:31:54
Message-ID:	CAD21AoCeM2nni1P7Z6KXzLM=6zCdShC82sOvuvu0_hBuJkm9Qw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

On Thu, May 22, 2025 at 3:57 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, May 21, 2025 at 11:54 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Wed, May 21, 2025 at 4:12 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> >
> > > However, after queuing the
> > > change, we don't need to copy it along with the original transaction's
> > > invalidations. This is because the copy is only required when we don't
> > > process any changes in cases like ReorderBufferForget().
> >
> > It seems that we use the accumulated invalidation message also after
> > replaying or concurrently aborting a transaction via
> > ReorderBufferExecuteInvalidations(). Do we need to consider such cases
> > too?
> >
>
> Good point. After replaying the transaction, it doesn't matter because
> we would have already relayed the required invalidation while
> processing REORDER_BUFFER_CHANGE_INVALIDATION messages.

I think the reason why we execute all invalidation messages even in
non concurrent abort cases is that we need to invalidate all caches as
well that are loaded during the replay. Consider the following
sequences:

1) S1: CREATE TABLE d (data text not null);
2) S1: INSERT INTO d VALUES ('d1');
3) S2: BEGIN; INSERT INTO d VALUES ('d2');
4) S3: BEGIN; INSERT INTO d VALUES ('d3');
5) S1: ALTER PUBLICATION pb ADD TABLE d;
6) S2: INSERT INTO d VALUES ('d4');
7) S2: COMMIT;
8) S3: COMMIT;
9) S2: INSERT INTO d VALUES('d5');
10) S1: INSERT INTO d VALUES ('d6');

When replaying S2's first transaction at 7), we decode the insert from
3) using the snapshot which is from before the ALTER, creating the
cache for table 'd'. Then we invalidate the cache by the inval message
distributed from S1's the ALTER and then build the relcache again when
decoding the insert from 6). The cache is the state after the ALTER.
When replaying S3's transaction at 8), we should decode the insert
from 4) using the snapshot which is from before the ALTER. Since we
call ReorderBufferExecuteInvalidations() also in non concurrent abort
paths, we can invalidate the relcache built when decoding the insert
from 6). If we don't include the inval message distributed from 5) to
txn->invalidations, we don't invalidate the relcache and end up
sending the insert from 4) even though it happened before the ALTER.

> However for
> concurrent abort case it could matter. See my analysis for the same
> below:
>
> Simulation of concurrent abort
> ------------------------------------------
> 1) S1: CREATE TABLE d(data text not null);
> 2) S1: INSERT INTO d VALUES('d1');
> 3) S2: BEGIN; INSERT INTO d VALUES('d2');
> 4) S2: INSERT INTO unrelated_tab VALUES(1);
> 5) S1: ALTER PUBLICATION pb ADD TABLE d;
> 6) S2: INSERT INTO unrelated_tab VALUES(2);
> 7) S2: ROLLBACK;
> 8) S2: INSERT INTO d VALUES('d3');
> 9) S1: INSERT INTO d VALUES('d4');
>
> The problem with the sequence is that the insert from 3) could be
> decoded *after* 5) in step 6) due to streaming and that to decode the
> insert (which happened before the ALTER) the catalog snapshot and
> cache state is from *before* the ALTER TABLE. Because the transaction
> started in 3) doesn't actually modify any catalogs, no invalidations
> are executed after decoding it. Now, assume, while decoding Insert
> from 4), we detected a concurrent abort, then the distributed
> invalidation won't be executed, and if we don't have accumulated
> messages in txn->invalidations, then the invalidation from step 5)
> won't be performed. The data loss can occur in steps 8 and 9. This is
> just a theory, so I could be missing something.

This scenario makes sense to me. I agree that this turns out to be a problem.

>
> If the above turns out to be a problem, one idea for fixing it is that
> for the concurrent abort case (both during streaming and for prepared
> transaction's processing), we still check all the remaining changes
> and process only the changes related to invalidations. This has to be
> done before the current txn changes are freed via
> ReorderBufferResetTXN->ReorderBufferTruncateTXN.
>
> Thoughts?

If the above hypothesis is true, we need to consider another idea so
that we can execute invalidation messages in both cases.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 at 2025-05-22 10:56:55 from Amit Kapila

Responses

RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 at 2025-05-25 04:55:16 from Hayato Kuroda (Fujitsu)
RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 at 2025-05-26 09:22:30 from Hayato Kuroda (Fujitsu)

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	ZhangChi	2025-05-24 03:06:24	Re: BUG #18936: Trigger enable users to modify the tables which hedoesn't have privilege
Previous Message	Tom Lane	2025-05-23 14:10:40	Re: Typo in the Timezone