Re: Fix race in ReplicationSlotRelease for ephemeral slots

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Fix race in ReplicationSlotRelease for ephemeral slots
Date: 2026-06-02 06:00:29
Message-ID: CABPTF7VyH1-W2xnDspECDEzFGQj=WTFpZBCqKfM11OAZa6gQHQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Sat, May 30, 2026 at 4:14 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Saturday, May 30, 2026 1:44 AM Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com> wrote:
>
> > > On Wed, May 27, 2026 at 5:20 PM Zhijie Hou (Fujitsu) <mailto:houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > > I haven't attached a test for this fix, as the change is straightforward and the
> > > Likelihood of encountering this bug is low, so it may not be worth adding test
> > > cycles for it. However, if others feel differently, I'm OK to add one.
> >
> > +1 for a test. The fix is just an else, so a future refactor could change it and silently
> > reintroduce the corruption, since it scribbles on an unrelated reused slot, nothing
> > would catch it. Injection points make it deterministic; I've attached a diff patch that adds
> > a test that fails without the fix and passes with it.
>
> Thanks for the test.
>
> I'm not sure if adding an injection point for this rare case is worthwhile. Even
> if we were to add one, future refactoring of that function could shift the
> position of the injection point, so its long-term usefulness is uncertain. I
> don't have a strong opinion on this, so I'll leave it to Fujii-San to decide.

Thanks for reporting this issue. I can reproduce it with lldb on Mac.

postgres=# SELECT pg_create_logical_replication_slot(
postgres(# 'test_slot_dropped',
postgres(# 'pgoutput2',
postgres(# false,
postgres(# false,
postgres(# true
postgres(# );
ERROR: could not access file "pgoutput2": No such file or directory

787 * decoding be disabled.
788 */
789 ReplicationSlotDropAcquired(is_logical);
-> 790 }
791
792 /*
793 * If slot needed to temporarily restrain both data and catalog xmin to
Target 0: (postgres) stopped.
(lldb) expr -- slot->data.name.data
(char[64]) $0 = "test_slot_created"
(lldb) expr -- slot->data.persistency
(ReplicationSlotPersistency) $1 = RS_PERSISTENT
(lldb) expr -- slot->active_proc
(ProcNumber) $2 = 126

The fix looks good to me.

There's an adjacent bug around drop_local_obsolete_slots. The root
cause of them looks similar -- ReplicationSlot * is a pointer to a
reusable shared-memory array cell, not a durable identity for the same
slot. In drop_local_obsolete_slots, the issue is that the slot has
been freed after ReplicationSlotDropAcquired(false); however, another
backend may reuse the same cell before the unlock/log reads. This
seems less severe -- it does not normally corrupt slot state, because
the code only read after the drop. But it can still unlock/log
misusing the identity of a different slot. Attached a test using
injection point to reproduce it and a patch to fix it.

--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.

Attachment Content-Type Size
repro_slotsync_stale_pointer.pl text/x-perl-script 3.2 KB
v1-0001-Avoid-stale-slot-access-after-dropping-obsolete-s.patch application/octet-stream 3.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2026-06-02 06:14:46 Re: pg_createsubscriber: allow duplicate publication names
Previous Message Chao Li 2026-06-02 05:57:09 Re: Fix regression in vacuumdb --analyze-in-stages for partitioned tables