pgsql: Prevent invalidation of newly synced replication slots.

From: Amit Kapila <akapila(at)postgresql(dot)org>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: Prevent invalidation of newly synced replication slots.
Date: 2026-01-27 05:56:21
Message-ID: E1vkc3p-002uNv-0p@gemulon.postgresql.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Prevent invalidation of newly synced replication slots.

A race condition could cause a newly synced replication slot to become
invalidated between its initial sync and the checkpoint.

When syncing a replication slot to a standby, the slot's initial
restart_lsn is taken from the publisher's remote_restart_lsn. Because slot
sync happens asynchronously, this value can lag behind the standby's
current redo pointer. Without any interlocking between WAL reservation and
checkpoints, a checkpoint may remove WAL required by the newly synced
slot, causing the slot to be invalidated.

To fix this, we acquire ReplicationSlotAllocationLock before reserving WAL
for a newly synced slot, similar to commit 006dd4b2e5. This ensures that
if WAL reservation happens first, the checkpoint process must wait for
slotsync to update the slot's restart_lsn before it computes the minimum
required LSN.

However, unlike in ReplicationSlotReserveWal(), this lock alone cannot
protect a newly synced slot if a checkpoint has already run
CheckPointReplicationSlots() before slotsync updates the slot. In such
cases, the remote restart_lsn may be stale and earlier than the current
redo pointer. To prevent relying on an outdated LSN, we use the oldest
WAL location available if it is greater than the remote restart_lsn.

This ensures that newly synced slots always start with a safe, non-stale
restart_lsn and are not invalidated by concurrent checkpoints.

Author: Zhijie Hou <houzj(dot)fnst(at)fujitsu(dot)com>
Reviewed-by: Hayato Kuroda <kuroda(dot)hayato(at)fujitsu(dot)com>
Reviewed-by: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Reviewed-by: Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>
Reviewed-by: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
Backpatch-through: 17
Discussion: https://postgr.es/m/TY4PR01MB16907E744589B1AB2EE89A31F94D7A%40TY4PR01MB16907.jpnprd01.prod.outlook.com

Branch
------
REL_17_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/3243c0177efb77511269d6037ee668abd2e81396

Modified Files
--------------
src/backend/access/transam/xlog.c | 6 +-
src/backend/replication/logical/slotsync.c | 97 +++++++++++-----------
src/include/access/xlog.h | 1 +
src/test/recovery/t/046_checkpoint_logical_slot.pl | 84 ++++++++++++++++++-
4 files changed, 136 insertions(+), 52 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Bertrand Drouvot 2026-01-27 06:36:11 Re: failed NUMA pages inquiry status: Operation not permitted
Previous Message Michael Paquier 2026-01-27 04:42:51 pgsql: Include extended statistics data in pg_dump