| From: | Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com> |
|---|---|
| To: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Fix race in ReplicationSlotRelease for ephemeral slots |
| Date: | 2026-05-29 16:44:10 |
| Message-ID: | CAFC+b6o-hD5VxVLZQovmHSYykF8Qzq3eiuBU-U1F_yR9-y6P_w@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On Wed, May 27, 2026 at 5:20 PM Zhijie Hou (Fujitsu) <houzj(dot)fnst(at)fujitsu(dot)com>
wrote:
> Hi,
>
> While testing the slot release logic, I noticed a bug in
> ReplicationSlotRelease() where it may access a replication slot array
> entry that
> has already been released by itself.
>
> The detail is: When releasing an ephemeral replication slot,
> ReplicationSlotRelease() first drops the slot via
> ReplicationSlotDropAcquired().
> After this point, the slot's shared memory slot array entry can be
> immediately
> reused by another backend creating a new slot.
>
> However, ReplicationSlotRelease() continued executing common cleanup code
> that
> still dereferenced the old slot pointer and updated shared memory fields
> such as
> effective_xmin. If the slot array entry had already been reallocated, these
> writes could inadvertently affect a different, unrelated slot.
>
> I am attaching a patch that avoids touching slot shared-memory state after
> dropping an ephemeral slot. Keep the post-release shared-memory updates
> only for
> non-ephemeral slots, where the slot remains valid after release.
>
> To reproduce, we can use the following steps:
>
> 1. Attach gdb to the backend and set a breakpoint in
> ReplicationSlotRelease()
> right after ReplicationSlotDropAcquired() is called.
> 2. Create an ephemeral slot in the above backend with an invalid output
> plugin:
> SELECT pg_create_logical_replication_slot('test_slot_dropped',
> 'pgoutput2', false, false, true);
> 3. Once the breakpoint is hit, start another backend and create a new slot
> named 'test_slot_created'.
> 4. Release the breakpoint and allow the first backend to continue. At this
> point, you will see it updating the new slot 'test_slot_created' ->
> active_proc
> (and effective_xmin, if a snapshot is being exported) to invalid values.
> 5. Start a third backend and attempt to acquire the same slot
> 'test_slot_created' ? this should not be possible under normal
> circumstances,
> but the bug allows it.
>
patch LGTM.
>
> I haven't attached a test for this fix, as the change is straightforward
> and the
> likelihood of encountering this bug is low, so it may not be worth adding
> test
> cycles for it. However, if others feel differently, I'm OK to add one.
>
+1 for a test. The fix is just an else, so a future refactor could change
it and silently
reintroduce the corruption, since it scribbles on an unrelated reused slot,
nothing
would catch it. Injection points make it deterministic; I've attached a
diff patch that adds
a test that fails without the fix and passes with it.
--
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/
| Attachment | Content-Type | Size |
|---|---|---|
| nocfbot-test.patch | application/octet-stream | 4.3 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2026-05-29 16:57:43 | Re: Uninitialized memory access in zic |
| Previous Message | Andres Freund | 2026-05-29 16:43:32 | Uninitialized memory access in zic |