| From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
|---|---|
| To: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
| Cc: | Xuneng Zhou <xunengzhou(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Fix race in ReplicationSlotRelease for ephemeral slots |
| Date: | 2026-06-16 09:35:39 |
| Message-ID: | CAA4eK1K7e1Y2iYkmRZ5CCh0pZOTMUShKDj0nP4nY3Wdcypt7oQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, Jun 16, 2026 at 2:24 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Tuesday, June 16, 2026 1:30 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> > On Fri, Jun 12, 2026 at 6:54 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > wrote:
> > >
> > > On Fri, Jun 12, 2026 at 8:22 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com>
> > wrote:
> > > >
> > > > On Thu, Jun 11, 2026 at 9:19 PM Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
> > > > In an off-list chat with Zhijie, we kinda thought that holding the
> > > > lock of a wrong db for a brief time doesn't seem to harm a lot. The
> > > > concurrent dropping-db operation leads to this issue seems rare in
> > > > practice. He stated that the deletion of the slot seems unavoidable
> > > > because we have to acquire the database lock after releasing the
> > > > replication slot lock to avoid the deadlock with the startup/drop db
> > > > operation. Therefore, he prefered keeping the design simple and
> > > > avoiding the fatal issue over doing a broader refactoring work.
> > > >
> > >
> > > +1. I also think this change is not worth it.
> >
> > I am also OK with the scope of change made by patch 1.
>
> I have one minor comment for the 0001 patch.
>
> + NameData slot_name = {0};
> ...
> SpinLockAcquire(&local_slot->mutex);
> synced_slot = local_slot->in_use && local_slot->data.synced;
> + if (synced_slot)
> + slot_name = local_slot->data.name;
> SpinLockRelease(&local_slot->mutex);
>
> We can defer assigning slot_name until after we pass the existing (synced_slot)
> check. Since it's a synced slot, no other process can change it at that point,
> and we can also skip initializing slot_name. (Please refer to the
> attached patch for suggested changes)
>
+ if (dropped)
+ ereport(LOG,
+ errmsg("dropped replication slot \"%s\" of database with OID %u",
+ NameStr(slot_name),
+ slot_database));
Can we avoid the if (dropped) check by placing this LOG message
immediately after dropping the slot under synced slot check?
--
With Regards,
Amit Kapila.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Dilip Kumar | 2026-06-16 09:43:24 | Re: Proposal: Conflict log history table for Logical Replication |
| Previous Message | Tender Wang | 2026-06-16 09:32:06 | Re: assertion failure with unique index + partitioning + join |