RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date: 2025-05-22 12:59:52
Message-ID: OSCPR01MB149669E1CAFE63051244F8E35F599A@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Dear Amit, Sawada-san,

> Good point. After replaying the transaction, it doesn't matter because
> we would have already relayed the required invalidation while
> processing REORDER_BUFFER_CHANGE_INVALIDATION messages. However
> for
> concurrent abort case it could matter. See my analysis for the same
> below:
>
> Simulation of concurrent abort
> ------------------------------------------
> 1) S1: CREATE TABLE d(data text not null);
> 2) S1: INSERT INTO d VALUES('d1');
> 3) S2: BEGIN; INSERT INTO d VALUES('d2');
> 4) S2: INSERT INTO unrelated_tab VALUES(1);
> 5) S1: ALTER PUBLICATION pb ADD TABLE d;
> 6) S2: INSERT INTO unrelated_tab VALUES(2);
> 7) S2: ROLLBACK;
> 8) S2: INSERT INTO d VALUES('d3');
> 9) S1: INSERT INTO d VALUES('d4');

> The problem with the sequence is that the insert from 3) could be
> decoded *after* 5) in step 6) due to streaming and that to decode the
> insert (which happened before the ALTER) the catalog snapshot and
> cache state is from *before* the ALTER TABLE. Because the transaction
> started in 3) doesn't actually modify any catalogs, no invalidations
> are executed after decoding it. Now, assume, while decoding Insert
> from 4), we detected a concurrent abort, then the distributed
> invalidation won't be executed, and if we don't have accumulated
> messages in txn->invalidations, then the invalidation from step 5)
> won't be performed. The data loss can occur in steps 8 and 9. This is
> just a theory, so I could be missing something.

I verified this is real or not, and succeeded to reproduce. See appendix the
detailed steps.

> If the above turns out to be a problem, one idea for fixing it is that
> for the concurrent abort case (both during streaming and for prepared
> transaction's processing), we still check all the remaining changes
> and process only the changes related to invalidations. This has to be
> done before the current txn changes are freed via
> ReorderBufferResetTXN->ReorderBufferTruncateTXN.

I roughly implemented the part, PSA the updated version. One concern is whether we
should consider the case that invalidations can cause ereport(ERROR). If happens,
the walsender will exit at that time.

Appendix - reproducer
==============
Only an instance was used in the test. Defined objects were:
```
CREATE TABLE d(data text not null);
CREATE TABLE unrelated_tab(data text not null);
CREATE PUBLICATION pb;
```

Then, pg_recvlogical was used to allow replicating changes. Actual command:
```
$ pg_recvlogical --plugin=pgoutput --create-slot --start --slot test -U postgres
-d postgres -o proto_version=4 -o publication_names=pb -o messages=true
-o streaming=true -f -
```

Below are the actual steps. Gdb debugger was used to synchronize tests.

0. Prepare two sessions S1, and S2, and one replication connection
1. Ran "INSERT INTO d VALUES('d1');" on S1
2. Ran "BEGIN; INSERT INTO d VALUES('d2');"
3. Ran "INSERT INTO unrelated_tab VALUES('d2');"
4. Ran "ALTER PUBLICATION pb ADD TABLE d;"
5. Attached the walsender process via gdb
6. set a breakpoint at HandleConcurrentAbort
7. Ran INSERT INTO unrelated_tab VALUES(generate_series(1, 5000));
This allows to stream changes in S2.
8. Confrimed that gdb stopped the walsender process.
9. Ran continue comamnd in gdb several times, to ensure the process
accesses to "unrelated_tab". On my env, backtrace at that time was [1].
10. Ran "ROLLBACK" in S2
11. On gdb session, moved forward the program and ensured that the concurrent_abort
error was raised.
12. gdb detached from the walsender
13. Ran "INSERT INTO d VALUES('d3');" on S2
14. Ran "INSERT INTO d VALUES('d4');" on S1.
15. Checked the output from pg_recvlogical, and confirmed d3 and d4 were not output [2]

[1]
```
Breakpoint 1, HandleConcurrentAbort () at ../postgres/src/backend/access/index/genam.c:484
484 if (TransactionIdIsValid(CheckXidAlive) &&
(gdb) bt
#0 HandleConcurrentAbort () at ../postgres/src/backend/access/index/genam.c:484
#1 0x000000000052628a in systable_getnext (sysscan=0x31bcbd0)
at ../postgres/src/backend/access/index/genam.c:545
#2 0x0000000000b37afa in SearchCatCacheMiss (cache=0x3107180, nkeys=1, hashValue=2617776010,
hashIndex=10, v1=16389, v2=0, v3=0, v4=0)
at ../postgres/src/backend/utils/cache/catcache.c:1544
#3 0x0000000000b379a3 in SearchCatCacheInternal (cache=0x3107180, nkeys=1, v1=16389, v2=0, v3=0,
v4=0) at ../postgres/src/backend/utils/cache/catcache.c:1464
#4 0x0000000000b3769a in SearchCatCache1 (cache=0x3107180, v1=16389)
at ../postgres/src/backend/utils/cache/catcache.c:1332
#5 0x0000000000b544d5 in SearchSysCache1 (cacheId=55, key1=16389)
at ../postgres/src/backend/utils/cache/syscache.c:228
#6 0x0000000000b3e62a in get_rel_namespace (relid=16389)
at ../postgres/src/backend/utils/cache/lsyscache.c:1956
#7 0x00007fa3fdb1e0ec in get_rel_sync_entry (data=0x3160108, relation=0x7fa3fd06f398)
at ../postgres/src/backend/replication/pgoutput/pgoutput.c:2037
#8 0x00007fa3fdb1d126 in pgoutput_change (ctx=0x315fd90, txn=0x31b8aa0, relation=0x7fa3fd06f398,
change=0x31bab18) at ../postgres/src/backend/replication/pgoutput/pgoutput.c:1455
--Type <RET> for more, q to quit, c to continue without paging--q
Quit
(gdb) f 8
#8 0x00007fa3fdb1d126 in pgoutput_change (ctx=0x315fd90, txn=0x31b8aa0, relation=0x7fa3fd06f398,
change=0x31bab18) at ../postgres/src/backend/replication/pgoutput/pgoutput.c:1455
1455 relentry = get_rel_sync_entry(data, relation);
(gdb) p relation->rd_rel.relname
$2 = {data = "unrelated_tab", '\000' <repeats 50 times>}
```
[2]:
```
$ pg_recvlogical --plugin=pgoutput --create-slot --start --slot test -U postgres
-d postgres -o proto_version=4 -o publication_names=pb -o messages=true
-o streaming=true -f -
S
E
A
```

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
v2-PG17-0001-Avoid-distributing-invalidation-messages-sev.patch application/octet-stream 7.1 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alexey Makhmutov 2025-05-22 13:29:37 Re: Standby server with cascade logical replication could not be properly stopped under load
Previous Message Shlok Kyal 2025-05-22 12:23:47 Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5