| From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
|---|---|
| To: | Xuneng Zhou <xunengzhou(at)gmail(dot)com> |
| Cc: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Fix race in ReplicationSlotRelease for ephemeral slots |
| Date: | 2026-07-02 04:26:16 |
| Message-ID: | CAA4eK1LJLJzmuo=-fh+7UNvsQoT3gOa=5JXsB7DujYRLuiz_=w@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Sat, Jun 20, 2026 at 3:12 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Sat, Jun 20, 2026 at 12:11 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> >
> > On Fri, Jun 19, 2026 at 8:08 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Thu, Jun 18, 2026 at 2:06 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> > > >
> > > > OK, how about elaborate it a bit like this:
> > > >
> > > > /*
> > > > * In the small window between getting the slot to drop and
> > > > * locking the database, there is a possibility of a parallel
> > > > * database drop by the startup process and the creation of a new
> > > > * slot by the user. This new user-created slot may end up using
> > > > * the same shared memory as that of 'local_slot'.
> > > > *
> > > > * If that happens, local_slot now describes the replacement slot:
> > > > * local_sync_slot_required() may have made its drop decision using
> > > > * the replacement slot's name or invalidation state, and slot_database
> > > > * may refer to the replacement slot's database. Thus check if
> > > > * local_slot is still a synced slot before performing the actual drop.
> > > > * This does not prove it is the original slot, but it prevents dropping
> > > > * an ordinary user-created replacement slot, and the copied database OID
> > > > * keeps lock/unlock symmetric. The remaining risk is limited to this
> > > > * cleanup cycle, such as briefly holding an unrelated database lock, and
> > > > * is acceptable here because this race is rare.
> > > > */
> > > >
> > >
> > > Okay inspired from your and Fujii-san's version, here is a third version:
> > > /*
> > > * In the small window between getting the slot to drop and
> > > * locking the database, there is a possibility of a parallel
> > > * database drop by the startup process and the creation of a new
> > > * slot by the user. This new user-created slot may end up using
> > > * the same shared memory as that of 'local_slot'.
> > > *
> > > * Because local_slot still points to a reusable slot-array entry,
> > > * its fields (name, database OID, invalidation state) may already
> > > * describe such a replacement slot by the time we reach here. That
> > > * means the drop decision made by local_sync_slot_required() above
> > > * could have been based on the replacement slot's data, and
> > > * slot_database could refer to an unrelated database. The recheck
> > > * below keeps us from actually dropping a user-created replacement
> > > * slot; the residual risk is confined to this cycle (for example,
> > > * briefly locking an unrelated database) and is acceptable because
> > > * the race is rare and non-fatal.
> > > */
> > >
> > > Thoughts?
> >
> > LGTM. It looks well-articulated.
> >
>
> Thanks, I'll push this as soon as the PG20 branch opens.
>
Pushed.
--
With Regards,
Amit Kapila.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Pavel Stehule | 2026-07-02 04:27:30 | Re: proposal - queryid can be used as filter for auto_explain |
| Previous Message | Bertrand Drouvot | 2026-07-02 04:23:16 | Re: Prevent crash when calling pgstat functions with unregistered stats kind |