| From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
|---|---|
| To: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
| Cc: | Xuneng Zhou <xunengzhou(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Fix race in ReplicationSlotRelease for ephemeral slots |
| Date: | 2026-06-11 09:22:24 |
| Message-ID: | CAA4eK1LqFBKCkX2eoX3iQPxJJnzWTaCpdh9zNotxuoG8BgjdtA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Sat, Jun 6, 2026 at 3:05 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Friday, June 5, 2026 8:45 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> > On Wed, Jun 3, 2026 at 8:03 PM Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Jun 2, 2026 at 3:00 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com>
> > wrote:
> > >
> > > /* Drop the local slot if it is not required to be retained. */
> > > if (!local_sync_slot_required(local_slot, remote_slot_list))
> > > {
> > > + bool dropped = false;
> > > + NameData slot_name = {0};
> > > + Oid slot_database = local_slot->data.database;
> > > bool synced_slot;
> > >
> > > Is it really safe to read slot_database before acquiring the database lock?
> >
> > Reading slot_database before taking the database lock seems not
> > inherently unsafe by itself. The comment suggests that the lock is
> > primarily used to prevent conflicts with the startup process running
> > ReplicationSlotsDropDBSlots() during db-drop replay; it does not
> > protect replication slot array reuse.
> >
> > The unsafe part could be reading slot_database from local_slot after
> > ReplicationSlotControlLock has been released. At this point, the slot
> > array cell may already have been freed and reused, so the value read
> > may no longer belong to the slot that get_local_synced_slots()
> > originally collected. As a result, we could end up locking the wrong
> > database.
> >
> > There seems to be two related issues:
> >
> > 1) Before drop: reading local_slot->data.database /
> > local_slot->data.name after the slot-array lock was released, before
> > verifying the cell still represents the same synced slot.
>
> I recall condition (1) is considered acceptable, since the database lock is
> released immediately after re-verifying that the slot is no longer the original
> 'synced' one anyway. Additionally, this race can only occur when replaying a
> DROP DATABASE, which is rare in practice. Since we only take a shared lock, it
> does not seem to cause real issues.
>
It seems that (1) is talking about the access to local_slot->data.name
before we acquire database lock in local_sync_slot_required() whereas
your response doesn't seem to address that concern. If not, then how
exactly does the database lock protect what we are doing in
local_sync_slot_required()?
--
With Regards,
Amit Kapila.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Ewan Young | 2026-06-11 09:40:38 | [PATCH] Fix some typos in code comments |
| Previous Message | Peter Eisentraut | 2026-06-11 09:22:10 | Make SPI_prepare argtypes argument const |