Fix race in ReplicationSlotRelease for ephemeral slots

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Fix race in ReplicationSlotRelease for ephemeral slots
Date: 2026-05-27 11:50:16
Message-ID: TY4PR01MB177184FF9EE916F577E1F554194082@TY4PR01MB17718.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

While testing the slot release logic, I noticed a bug in
ReplicationSlotRelease() where it may access a replication slot array entry that
has already been released by itself.

The detail is: When releasing an ephemeral replication slot,
ReplicationSlotRelease() first drops the slot via ReplicationSlotDropAcquired().
After this point, the slot's shared memory slot array entry can be immediately
reused by another backend creating a new slot.

However, ReplicationSlotRelease() continued executing common cleanup code that
still dereferenced the old slot pointer and updated shared memory fields such as
effective_xmin. If the slot array entry had already been reallocated, these
writes could inadvertently affect a different, unrelated slot.

I am attaching a patch that avoids touching slot shared-memory state after
dropping an ephemeral slot. Keep the post-release shared-memory updates only for
non-ephemeral slots, where the slot remains valid after release.

To reproduce, we can use the following steps:

1. Attach gdb to the backend and set a breakpoint in ReplicationSlotRelease()
right after ReplicationSlotDropAcquired() is called.
2. Create an ephemeral slot in the above backend with an invalid output plugin:
SELECT pg_create_logical_replication_slot('test_slot_dropped', 'pgoutput2', false, false, true);
3. Once the breakpoint is hit, start another backend and create a new slot
named 'test_slot_created'.
4. Release the breakpoint and allow the first backend to continue. At this
point, you will see it updating the new slot 'test_slot_created' -> active_proc
(and effective_xmin, if a snapshot is being exported) to invalid values.
5. Start a third backend and attempt to acquire the same slot
'test_slot_created' ? this should not be possible under normal circumstances,
but the bug allows it.

I haven't attached a test for this fix, as the change is straightforward and the
likelihood of encountering this bug is low, so it may not be worth adding test
cycles for it. However, if others feel differently, I'm OK to add one.

Best Regards,
Hou zj

Attachment Content-Type Size
v1-0001-Fix-race-in-ReplicationSlotRelease-for-ephemeral-.patch application/octet-stream 3.9 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Thom Brown 2026-05-27 12:03:24 Re: generic plans and "initial" pruning
Previous Message Zhijie Hou (Fujitsu) 2026-05-27 11:46:51 RE: [PATCH] Release replication slot on error in SQL-callable slot functions