| From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
|---|---|
| To: | Xuneng Zhou <xunengzhou(at)gmail(dot)com> |
| Cc: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Fix race in ReplicationSlotRelease for ephemeral slots |
| Date: | 2026-06-03 12:03:39 |
| Message-ID: | CAHGQGwE+2WSqiAYgNJRkf_twdB+uRGozjjGhUn76vUKZ8dzbSA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, Jun 2, 2026 at 3:00 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
>
> Hi,
>
> On Sat, May 30, 2026 at 4:14 PM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > I'm not sure if adding an injection point for this rare case is worthwhile. Even
> > if we were to add one, future refactoring of that function could shift the
> > position of the injection point, so its long-term usefulness is uncertain. I
> > don't have a strong opinion on this, so I'll leave it to Fujii-San to decide.
I've pushed the patch. Thanks!
IMO the proposed test looks a bit too narrow, so I'm not sure it's worth
adding at this point. For now, I've committed only the code fix.
> There's an adjacent bug around drop_local_obsolete_slots. The root
> cause of them looks similar -- ReplicationSlot * is a pointer to a
> reusable shared-memory array cell, not a durable identity for the same
> slot. In drop_local_obsolete_slots, the issue is that the slot has
> been freed after ReplicationSlotDropAcquired(false); however, another
> backend may reuse the same cell before the unlock/log reads. This
> seems less severe -- it does not normally corrupt slot state, because
> the code only read after the drop. But it can still unlock/log
> misusing the identity of a different slot. Attached a test using
> injection point to reproduce it and a patch to fix it.
Thanks again for the report and patch!
/* Drop the local slot if it is not required to be retained. */
if (!local_sync_slot_required(local_slot, remote_slot_list))
{
+ bool dropped = false;
+ NameData slot_name = {0};
+ Oid slot_database = local_slot->data.database;
bool synced_slot;
Is it really safe to read slot_database before acquiring the database lock?
BTW, I'm also wondering whether it's safe for
local_sync_slot_required() to access local_slot without holding a lock.
Regards,
--
Fujii Masao
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jakub Wartak | 2026-06-03 12:46:56 | Re: Heads Up: cirrus-ci is shutting down June 1st |
| Previous Message | Jakub Wartak | 2026-06-03 11:30:01 | Re: Heads Up: cirrus-ci is shutting down June 1st |