From: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
---|---|
To: | 'Masahiko Sawada' <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 |
Date: | 2025-05-26 09:22:30 |
Message-ID: | OS7PR01MB14968B3C263074A2DEB77DB58F565A@OS7PR01MB14968.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Dear Sawada-san,
> I think the reason why we execute all invalidation messages even in
> non concurrent abort cases is that we need to invalidate all caches as
> well that are loaded during the replay. Consider the following
> sequences:
>
> 1) S1: CREATE TABLE d (data text not null);
> 2) S1: INSERT INTO d VALUES ('d1');
> 3) S2: BEGIN; INSERT INTO d VALUES ('d2');
> 4) S3: BEGIN; INSERT INTO d VALUES ('d3');
> 5) S1: ALTER PUBLICATION pb ADD TABLE d;
> 6) S2: INSERT INTO d VALUES ('d4');
> 7) S2: COMMIT;
> 8) S3: COMMIT;
> 9) S2: INSERT INTO d VALUES('d5');
> 10) S1: INSERT INTO d VALUES ('d6');
>
> When replaying S2's first transaction at 7), we decode the insert from
> 3) using the snapshot which is from before the ALTER, creating the
> cache for table 'd'. Then we invalidate the cache by the inval message
> distributed from S1's the ALTER and then build the relcache again when
> decoding the insert from 6). The cache is the state after the ALTER.
> When replaying S3's transaction at 8), we should decode the insert
> from 4) using the snapshot which is from before the ALTER. Since we
> call ReorderBufferExecuteInvalidations() also in non concurrent abort
> paths, we can invalidate the relcache built when decoding the insert
> from 6). If we don't include the inval message distributed from 5) to
> txn->invalidations, we don't invalidate the relcache and end up
> sending the insert from 4) even though it happened before the ALTER.
You're right. I tested the workload on the latest PG17 and PoC, and confirmed that
PoC replicated d3 tuple, which is not good.
> If the above hypothesis is true, we need to consider another idea so
> that we can execute invalidation messages in both cases.
The straightforward fix is to check the change queue as well when the transaction
has invalidation messages. 0003 implemented that. One downside is that traversing
changes can affect performance. Currently we iterates all of changes even a
single REORDER_BUFFER_CHANGE_INVALIDATION. I cannot find better solutions for now.
Thought?
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Attachment | Content-Type | Size |
---|---|---|
v3-PG17-0001-Avoid-distributing-invalidation-messages-sev.patch | application/octet-stream | 8.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2025-05-26 11:18:56 | Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 |
Previous Message | Luis Couto | 2025-05-26 09:18:22 | Re: BUG #18934: Even with WITH ADMIN OPTION, I Cannot Manage Role Memberships |