RE: Fix race in ReplicationSlotRelease for ephemeral slots

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: RE: Fix race in ReplicationSlotRelease for ephemeral slots
Date: 2026-06-16 08:54:11
Message-ID: TY4PR01MB17718F4D0C5C8EB96A303C2E594E52@TY4PR01MB17718.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, June 16, 2026 1:30 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> On Fri, Jun 12, 2026 at 6:54 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> >
> > On Fri, Jun 12, 2026 at 8:22 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com>
> wrote:
> > >
> > > On Thu, Jun 11, 2026 at 9:19 PM Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
> > > In an off-list chat with Zhijie, we kinda thought that holding the
> > > lock of a wrong db for a brief time doesn't seem to harm a lot. The
> > > concurrent dropping-db operation leads to this issue seems rare in
> > > practice. He stated that the deletion of the slot seems unavoidable
> > > because we have to acquire the database lock after releasing the
> > > replication slot lock to avoid the deadlock with the startup/drop db
> > > operation. Therefore, he prefered keeping the design simple and
> > > avoiding the fatal issue over doing a broader refactoring work.
> > >
> >
> > +1. I also think this change is not worth it.
>
> I am also OK with the scope of change made by patch 1.

I have one minor comment for the 0001 patch.

+ NameData slot_name = {0};
...
SpinLockAcquire(&local_slot->mutex);
synced_slot = local_slot->in_use && local_slot->data.synced;
+ if (synced_slot)
+ slot_name = local_slot->data.name;
SpinLockRelease(&local_slot->mutex);

We can defer assigning slot_name until after we pass the existing (synced_slot)
check. Since it's a synced slot, no other process can change it at that point,
and we can also skip initializing slot_name. (Please refer to the
attached patch for suggested changes)

Best Regards,
Hou zj

Attachment Content-Type Size
0001-comments_patch application/octet-stream 1.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Nidzwetzki 2026-06-16 08:58:52 Re: [PATCH] Fix PITR pause bypass when initial XLOG_RUNNING_XACTS has subxid overflow
Previous Message Kyotaro Horiguchi 2026-06-16 08:28:31 Re: Disable startup progress timeout during standby WAL replay