| From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
|---|---|
| To: | Xuneng Zhou <xunengzhou(at)gmail(dot)com> |
| Cc: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Fix race in ReplicationSlotRelease for ephemeral slots |
| Date: | 2026-06-18 03:19:53 |
| Message-ID: | CAA4eK1KT-FMjXfFdg98qq_04PNnW6T1G-SUNU9dUtn4AC5g86A@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Jun 17, 2026 at 12:59 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
>
> On Tue, Jun 16, 2026 at 8:46 PM Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> >
> > On Fri, Jun 12, 2026 at 7:54 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > I feel even if there is an argument to do such a refactoring, it can
> > > be done separately. We can push forward with 0001 and then do more
> > > discussion for 0002, if required. I can take care of 0001 unless
> > > Fujii-San wishes to take care of it?
> >
> > Yeah, please feel free to work on 0001.
> >
> > Regarding 0002, since the race is very rare and non-fatal, I'm okay
> > with accepting the risk rather than adding more refactoring just to
> > avoid it.
> >
> > I'm a bit tempted to add a source comment explaining the risk and
> > why we accept it, though, so other developers can understand
> > the tradeoff. For example:
> >
> > diff --git a/src/backend/replication/logical/slotsync.c
> > b/src/backend/replication/logical/slotsync.c
> > index 05637344363..ca49f20e7d9 100644
> > --- a/src/backend/replication/logical/slotsync.c
> > +++ b/src/backend/replication/logical/slotsync.c
> > @@ -560,6 +560,12 @@ drop_local_obsolete_slots(List *remote_slot_list)
> > * the same shared memory as that of
> > 'local_slot'. Thus check if
> > * local_slot is still the synced one before
> > performing the actual
> > * drop.
> > + *
> > + * Because local_slot still points to a
> > reusable slot-array entry,
> > + * fields such as name or database OID could
> > already be stale here.
> > + * That could cause an incorrect cleanup
> > decision for this cycle or
> > + * briefly lock an unrelated database. We
> > accept that risk because
> > + * this race is rare and non-fatal.
> > */
> > SpinLockAcquire(&local_slot->mutex);
> > synced_slot = local_slot->in_use &&
> > local_slot->data.synced;
>
> Thanks for suggesting the comment! It helps to clarify the situation
> and the trade-off we made here. I tweaked it a bit and added it to the
> patches prepared by Zhijie.
>
+ *
+ * We cannot close this window by holding
+ * ReplicationSlotControlLock while taking the database lock,
+ * because the database-drop path holds the database lock and then
+ * scans replication slots.
The database-drop path acquires ReplicationSlotControlLock in shared
mode, so not sure if the above is completely correct, here you are
going in the direction of trying to defend that no easy solution
exists which needs more thought. Fujii-San's proposal was better but
there also we may need to be a bit more specific about "That could
cause an incorrect cleanup decision ...", otherwise, it makes the
comment unclear.
I am planning to commit and backpatch the fix for the first problem
based on what Hou-San has shared (v2-*), then we can discuss how to
improve the existing comments and if we agree on something, that can
be a HEAD-only patch.
--
With Regards,
Amit Kapila.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Chao Li | 2026-06-18 03:24:04 | pgbench --continue-on-error: clarify TPS and failure reporting |
| Previous Message | Chao Li | 2026-06-18 03:05:58 | Re: IGNORE/RESPECT NULLS can be specified for (prokind == 'f'). |