| From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
|---|---|
| To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
| Cc: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Xuneng Zhou <xunengzhou(at)gmail(dot)com>, Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Fix race in ReplicationSlotRelease for ephemeral slots |
| Date: | 2026-06-11 13:19:33 |
| Message-ID: | CAHGQGwG_3ff4HciHtTZ_uMvbJgSDWsz4Yawj_zQpDG6Yj=Mjng@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Thu, Jun 11, 2026 at 8:18 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> 1. Stale name read in local_sync_slot_required(): The reused cell
> holds a different name. local_sync_slot_required() might return false
> (drop needed). But then the in_use && synced spinlock check sees
> synced = false and skips the actual drop. The wrong decision is
> caught.
Yes, we could skip the actual drop. But then wouldn't we still emit
the log message "dropped replication slot ..." even though no slot was
actually dropped?
> 2. Wrong database OID read at line 551: The reused cell holds OID_B
> from the new slot. We lock OID_B, then at lines 563–565 we see synced
> = false, skip the drop, and unlock OID_B at line 579. Since no drop
> occurred, the cell is still the same non-synced slot, so the lock and
> unlock see the same OID_B. Symmetric — no lock leak.
What happens if the slot for OID_B is dropped after we lock
OID_B, and then a new slot for OID_C reuses the same array entry? In
that case, wouldn't the later unlock read OID_C from
local_slot->data.database even though the lock was originally taken on
OID_B?
Regards,
--
Fujii Masao
| From | Date | Subject | |
|---|---|---|---|
| Previous Message | Andres Freund | 2026-06-11 13:09:34 | Re: Heads Up: cirrus-ci is shutting down June 1st |