| From: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
|---|---|
| To: | Xuneng Zhou <xunengzhou(at)gmail(dot)com> |
| Cc: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
| Subject: | RE: Fix race in ReplicationSlotRelease for ephemeral slots |
| Date: | 2026-06-16 08:54:11 |
| Message-ID: | TY4PR01MB17718F4D0C5C8EB96A303C2E594E52@TY4PR01MB17718.jpnprd01.prod.outlook.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tuesday, June 16, 2026 1:30 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> On Fri, Jun 12, 2026 at 6:54 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> >
> > On Fri, Jun 12, 2026 at 8:22 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com>
> wrote:
> > >
> > > On Thu, Jun 11, 2026 at 9:19 PM Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
> > > In an off-list chat with Zhijie, we kinda thought that holding the
> > > lock of a wrong db for a brief time doesn't seem to harm a lot. The
> > > concurrent dropping-db operation leads to this issue seems rare in
> > > practice. He stated that the deletion of the slot seems unavoidable
> > > because we have to acquire the database lock after releasing the
> > > replication slot lock to avoid the deadlock with the startup/drop db
> > > operation. Therefore, he prefered keeping the design simple and
> > > avoiding the fatal issue over doing a broader refactoring work.
> > >
> >
> > +1. I also think this change is not worth it.
>
> I am also OK with the scope of change made by patch 1.
I have one minor comment for the 0001 patch.
+ NameData slot_name = {0};
...
SpinLockAcquire(&local_slot->mutex);
synced_slot = local_slot->in_use && local_slot->data.synced;
+ if (synced_slot)
+ slot_name = local_slot->data.name;
SpinLockRelease(&local_slot->mutex);
We can defer assigning slot_name until after we pass the existing (synced_slot)
check. Since it's a synced slot, no other process can change it at that point,
and we can also skip initializing slot_name. (Please refer to the
attached patch for suggested changes)
Best Regards,
Hou zj
| Attachment | Content-Type | Size |
|---|---|---|
| 0001-comments_patch | application/octet-stream | 1.3 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jan Nidzwetzki | 2026-06-16 08:58:52 | Re: [PATCH] Fix PITR pause bypass when initial XLOG_RUNNING_XACTS has subxid overflow |
| Previous Message | Kyotaro Horiguchi | 2026-06-16 08:28:31 | Re: Disable startup progress timeout during standby WAL replay |