Re: Fix race in ReplicationSlotRelease for ephemeral slots

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Xuneng Zhou <xunengzhou(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Fix race in ReplicationSlotRelease for ephemeral slots
Date: 2026-06-16 09:35:39
Message-ID: CAA4eK1K7e1Y2iYkmRZ5CCh0pZOTMUShKDj0nP4nY3Wdcypt7oQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 16, 2026 at 2:24 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Tuesday, June 16, 2026 1:30 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> > On Fri, Jun 12, 2026 at 6:54 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > wrote:
> > >
> > > On Fri, Jun 12, 2026 at 8:22 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com>
> > wrote:
> > > >
> > > > On Thu, Jun 11, 2026 at 9:19 PM Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
> > > > In an off-list chat with Zhijie, we kinda thought that holding the
> > > > lock of a wrong db for a brief time doesn't seem to harm a lot. The
> > > > concurrent dropping-db operation leads to this issue seems rare in
> > > > practice. He stated that the deletion of the slot seems unavoidable
> > > > because we have to acquire the database lock after releasing the
> > > > replication slot lock to avoid the deadlock with the startup/drop db
> > > > operation. Therefore, he prefered keeping the design simple and
> > > > avoiding the fatal issue over doing a broader refactoring work.
> > > >
> > >
> > > +1. I also think this change is not worth it.
> >
> > I am also OK with the scope of change made by patch 1.
>
> I have one minor comment for the 0001 patch.
>
> + NameData slot_name = {0};
> ...
> SpinLockAcquire(&local_slot->mutex);
> synced_slot = local_slot->in_use && local_slot->data.synced;
> + if (synced_slot)
> + slot_name = local_slot->data.name;
> SpinLockRelease(&local_slot->mutex);
>
> We can defer assigning slot_name until after we pass the existing (synced_slot)
> check. Since it's a synced slot, no other process can change it at that point,
> and we can also skip initializing slot_name. (Please refer to the
> attached patch for suggested changes)
>

+ if (dropped)
+ ereport(LOG,
+ errmsg("dropped replication slot \"%s\" of database with OID %u",
+ NameStr(slot_name),
+ slot_database));

Can we avoid the if (dropped) check by placing this LOG message
immediately after dropping the slot under synced slot check?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2026-06-16 09:43:24 Re: Proposal: Conflict log history table for Logical Replication
Previous Message Tender Wang 2026-06-16 09:32:06 Re: assertion failure with unique index + partitioning + join