| From: | Vishal Prasanna <vishal(dot)g(at)zohocorp(dot)com> |
|---|---|
| To: | "pgsql-bugs" <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
| Subject: | [BUG] Assert failure in ReorderBufferReturnTXN during logical decoding due to leaked specinsert change |
| Date: | 2026-02-19 13:43:10 |
| Message-ID: | 19c7623e882.4080fd5426212.311756747309556767@zohocorp.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
Hi,
We recently encountered an issue with the Postgres server. After we dropped a publication
that was being used by a subscriber, the walsender process on the publisher started crashing
with an assertion failure in `ReorderBufferReturnTXN()` while decoding changes from the replication slot.
The error message:
"TRAP: failed Assert("txn->size == 0"), File: "reorderbuffer.c", Line: 494"
Issue observed in PostgreSQL 17.7 with assert-enabled builds.
TRAP: failed Assert("txn->size == 0"), File: "reorderbuffer.c", Line: 494, PID: 96941
0 postgres ExceptionalCondition + 216
1 postgres ReorderBufferReturnTXN + 284
2 postgres ReorderBufferCleanupTXN + 1292
3 postgres ReorderBufferProcessTXN + 4304
4 postgres ReorderBufferReplay + 284
5 postgres ReorderBufferCommit + 128
6 postgres DecodeCommit + 584
7 postgres xact_decode + 408
8 postgres LogicalDecodingProcessRecord + 192
9 postgres XLogSendLogical + 204
10 postgres WalSndLoop + 256
11 postgres StartLogicalReplication + 636
12 postgres exec_replication_command + 1384
13 postgres PostgresMain + 2476
14 postgres BackendInitialize + 0
15 postgres postmaster_child_launch + 304
16 postgres BackendStartup + 448
17 postgres ServerLoop + 372
18 postgres PostmasterMain + 6396
19 postgres startup_hacks + 0
20 dyld start + 7184
Simplified steps to reproduce:
Step 1: Create a table, set up a logical replication slot, and execute an `INSERT ... ON CONFLICT ... DO UPDATE` statement.
```
CREATE TABLE test_updates (id INT PRIMARY KEY, value TEXT);
SELECT * FROM pg_create_logical_replication_slot('testing_slot', 'pgoutput');
INSERT INTO test_updates (id, value) VALUES (1, 'first_insert')
ON CONFLICT(id) DO UPDATE SET value = excluded.value;
```
Step 2: Start logical decoding using `pg_logical_slot_peek_binary_changes` on the slot,
referencing a publication (`pub_1`) that does not exist.
```
SELECT *
FROM pg_logical_slot_peek_binary_changes('testing_slot', NULL, NULL, 'proto_version', '1', 'publication_names', 'pub_1');
```
After Step 2, the Postgres server crashes with the assertion failure shown above.
Reason for server crash:
1. In `ReorderBufferProcessTXN()`, when a `REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT`
is encountered, the change is unlinked from the list by `dlist_delete()` and stored in `specinsert`.
2. Next, when a `REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM` is encountered,
the stored `specinsert` change is retrieved and applied as a regular insert change.
3. Since publication `pub_1` does not exist, an error is raised when applying the change
and caught by the `PG_CATCH` block in `ReorderBufferProcessTXN()`. The error call stack is:
```
ReorderBufferProcessTXN()
ReorderBufferApplyChange()
→ rb->apply_change() [pgoutput_change()]
→ get_rel_sync_entry()
→ LoadPublications()
→ GetPublicationByName()
→ get_publication_oid() ← throws ERROR here
```
4. In the `PG_CATCH` block, the `specinsert` change is cleaned up only for
streaming/prepared transactions (via `ReorderBufferResetTXN()`) but not for non-streaming transactions.
In the `else` branch at line `2697`, `ReorderBufferCleanupTXN()` is called, which iterates through
the remaining changes and frees them. However, since the `specinsert` change was already unlinked
from the list, it is never found and never freed. This leaked change is still accounted for in `txn->size`,
causing the assertion failure.
Proposed Solution: Free the pending `specinsert` change in the `else` branch before calling `ReorderBufferCleanupTXN()`.
Since the `specinsert` change has already been unlinked, `ReorderBufferCleanupTXN()` cannot find or free it.
Explicitly freeing it here prevents the `Assert(txn->size == 0)` failure in `ReorderBufferReturnTXN()`.
```
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2691,12 +2691,18 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
ReorderBufferResetTXN(rb, txn, snapshot_now,
command_id, prev_lsn,
specinsert);
}
else
{
+ /* Return the spec insert change before cleaning up the transaction */
+ if (specinsert != NULL)
+ {
+ ReorderBufferReturnChange(rb, specinsert, true);
+ specinsert = NULL;
+ }
ReorderBufferCleanupTXN(rb, txn);
MemoryContextSwitchTo(ecxt);
PG_RE_THROW();
}
}
PG_END_TRY();
```
Along with this fix, we have also added a TAP test to verify in patch.
Additional Suggestion:
Currently in `PG_CATCH` block, `specinsert` is only freed in the `ERRCODE_TRANSACTION_ROLLBACK` branch
for streaming or prepared transactions, via `ReorderBufferResetTXN()` at line 2691.
Would it make sense to move the freeing of `specinsert` before the if/else branch,
so that it is always freed regardless of the error path? This would avoid duplication and ensure
that `specinsert` is always cleaned up.
Regards,
Vishal Prasanna
Zoho Corporation
| Attachment | Content-Type | Size |
|---|---|---|
| 0001-Fix-specinsert-leak-in-ReorderBufferProcessTXN-error.patch | application/octet-stream | 2.6 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Aaron Caskey-Demaret | 2026-02-19 14:47:18 | Error "could not access status of transaction" - version 14.21 |
| Previous Message | PG Bug reporting form | 2026-02-19 12:24:50 | BUG #19414: Wrong rows returned by index scan on HASH-partitioned table with PRIMARY KEY (PostgreSQL 16.9) |