From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 |
Date: | 2025-05-23 17:31:54 |
Message-ID: | CAD21AoCeM2nni1P7Z6KXzLM=6zCdShC82sOvuvu0_hBuJkm9Qw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Thu, May 22, 2025 at 3:57 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, May 21, 2025 at 11:54 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Wed, May 21, 2025 at 4:12 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> >
> > > However, after queuing the
> > > change, we don't need to copy it along with the original transaction's
> > > invalidations. This is because the copy is only required when we don't
> > > process any changes in cases like ReorderBufferForget().
> >
> > It seems that we use the accumulated invalidation message also after
> > replaying or concurrently aborting a transaction via
> > ReorderBufferExecuteInvalidations(). Do we need to consider such cases
> > too?
> >
>
> Good point. After replaying the transaction, it doesn't matter because
> we would have already relayed the required invalidation while
> processing REORDER_BUFFER_CHANGE_INVALIDATION messages.
I think the reason why we execute all invalidation messages even in
non concurrent abort cases is that we need to invalidate all caches as
well that are loaded during the replay. Consider the following
sequences:
1) S1: CREATE TABLE d (data text not null);
2) S1: INSERT INTO d VALUES ('d1');
3) S2: BEGIN; INSERT INTO d VALUES ('d2');
4) S3: BEGIN; INSERT INTO d VALUES ('d3');
5) S1: ALTER PUBLICATION pb ADD TABLE d;
6) S2: INSERT INTO d VALUES ('d4');
7) S2: COMMIT;
8) S3: COMMIT;
9) S2: INSERT INTO d VALUES('d5');
10) S1: INSERT INTO d VALUES ('d6');
When replaying S2's first transaction at 7), we decode the insert from
3) using the snapshot which is from before the ALTER, creating the
cache for table 'd'. Then we invalidate the cache by the inval message
distributed from S1's the ALTER and then build the relcache again when
decoding the insert from 6). The cache is the state after the ALTER.
When replaying S3's transaction at 8), we should decode the insert
from 4) using the snapshot which is from before the ALTER. Since we
call ReorderBufferExecuteInvalidations() also in non concurrent abort
paths, we can invalidate the relcache built when decoding the insert
from 6). If we don't include the inval message distributed from 5) to
txn->invalidations, we don't invalidate the relcache and end up
sending the insert from 4) even though it happened before the ALTER.
> However for
> concurrent abort case it could matter. See my analysis for the same
> below:
>
> Simulation of concurrent abort
> ------------------------------------------
> 1) S1: CREATE TABLE d(data text not null);
> 2) S1: INSERT INTO d VALUES('d1');
> 3) S2: BEGIN; INSERT INTO d VALUES('d2');
> 4) S2: INSERT INTO unrelated_tab VALUES(1);
> 5) S1: ALTER PUBLICATION pb ADD TABLE d;
> 6) S2: INSERT INTO unrelated_tab VALUES(2);
> 7) S2: ROLLBACK;
> 8) S2: INSERT INTO d VALUES('d3');
> 9) S1: INSERT INTO d VALUES('d4');
>
> The problem with the sequence is that the insert from 3) could be
> decoded *after* 5) in step 6) due to streaming and that to decode the
> insert (which happened before the ALTER) the catalog snapshot and
> cache state is from *before* the ALTER TABLE. Because the transaction
> started in 3) doesn't actually modify any catalogs, no invalidations
> are executed after decoding it. Now, assume, while decoding Insert
> from 4), we detected a concurrent abort, then the distributed
> invalidation won't be executed, and if we don't have accumulated
> messages in txn->invalidations, then the invalidation from step 5)
> won't be performed. The data loss can occur in steps 8 and 9. This is
> just a theory, so I could be missing something.
This scenario makes sense to me. I agree that this turns out to be a problem.
>
> If the above turns out to be a problem, one idea for fixing it is that
> for the concurrent abort case (both during streaming and for prepared
> transaction's processing), we still check all the remaining changes
> and process only the changes related to invalidations. This has to be
> done before the current txn changes are freed via
> ReorderBufferResetTXN->ReorderBufferTruncateTXN.
>
> Thoughts?
If the above hypothesis is true, we need to consider another idea so
that we can execute invalidation messages in both cases.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | ZhangChi | 2025-05-24 03:06:24 | Re: BUG #18936: Trigger enable users to modify the tables which hedoesn't have privilege |
Previous Message | Tom Lane | 2025-05-23 14:10:40 | Re: Typo in the Timezone |