| From: | Xuneng Zhou <xunengzhou(at)gmail(dot)com> |
|---|---|
| To: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
| Cc: | Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Fix race in ReplicationSlotRelease for ephemeral slots |
| Date: | 2026-06-02 06:00:29 |
| Message-ID: | CABPTF7VyH1-W2xnDspECDEzFGQj=WTFpZBCqKfM11OAZa6gQHQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On Sat, May 30, 2026 at 4:14 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Saturday, May 30, 2026 1:44 AM Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com> wrote:
>
> > > On Wed, May 27, 2026 at 5:20 PM Zhijie Hou (Fujitsu) <mailto:houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > > I haven't attached a test for this fix, as the change is straightforward and the
> > > Likelihood of encountering this bug is low, so it may not be worth adding test
> > > cycles for it. However, if others feel differently, I'm OK to add one.
> >
> > +1 for a test. The fix is just an else, so a future refactor could change it and silently
> > reintroduce the corruption, since it scribbles on an unrelated reused slot, nothing
> > would catch it. Injection points make it deterministic; I've attached a diff patch that adds
> > a test that fails without the fix and passes with it.
>
> Thanks for the test.
>
> I'm not sure if adding an injection point for this rare case is worthwhile. Even
> if we were to add one, future refactoring of that function could shift the
> position of the injection point, so its long-term usefulness is uncertain. I
> don't have a strong opinion on this, so I'll leave it to Fujii-San to decide.
Thanks for reporting this issue. I can reproduce it with lldb on Mac.
postgres=# SELECT pg_create_logical_replication_slot(
postgres(# 'test_slot_dropped',
postgres(# 'pgoutput2',
postgres(# false,
postgres(# false,
postgres(# true
postgres(# );
ERROR: could not access file "pgoutput2": No such file or directory
787 * decoding be disabled.
788 */
789 ReplicationSlotDropAcquired(is_logical);
-> 790 }
791
792 /*
793 * If slot needed to temporarily restrain both data and catalog xmin to
Target 0: (postgres) stopped.
(lldb) expr -- slot->data.name.data
(char[64]) $0 = "test_slot_created"
(lldb) expr -- slot->data.persistency
(ReplicationSlotPersistency) $1 = RS_PERSISTENT
(lldb) expr -- slot->active_proc
(ProcNumber) $2 = 126
The fix looks good to me.
There's an adjacent bug around drop_local_obsolete_slots. The root
cause of them looks similar -- ReplicationSlot * is a pointer to a
reusable shared-memory array cell, not a durable identity for the same
slot. In drop_local_obsolete_slots, the issue is that the slot has
been freed after ReplicationSlotDropAcquired(false); however, another
backend may reuse the same cell before the unlock/log reads. This
seems less severe -- it does not normally corrupt slot state, because
the code only read after the drop. But it can still unlock/log
misusing the identity of a different slot. Attached a test using
injection point to reproduce it and a patch to fix it.
--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.
| Attachment | Content-Type | Size |
|---|---|---|
| repro_slotsync_stale_pointer.pl | text/x-perl-script | 3.2 KB |
| v1-0001-Avoid-stale-slot-access-after-dropping-obsolete-s.patch | application/octet-stream | 3.0 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Amit Kapila | 2026-06-02 06:14:46 | Re: pg_createsubscriber: allow duplicate publication names |
| Previous Message | Chao Li | 2026-06-02 05:57:09 | Re: Fix regression in vacuumdb --analyze-in-stages for partitioned tables |