Re: Fix race in ReplicationSlotRelease for ephemeral slots

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Fix race in ReplicationSlotRelease for ephemeral slots
Date: 2026-06-19 12:08:37
Message-ID: CAA4eK1JBBcS9R3m8nR93E5P-WxRwRx=AM+STZrp9g1Ma13kfag@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 18, 2026 at 2:06 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
>
> OK, how about elaborate it a bit like this:
>
> /*
> * In the small window between getting the slot to drop and
> * locking the database, there is a possibility of a parallel
> * database drop by the startup process and the creation of a new
> * slot by the user. This new user-created slot may end up using
> * the same shared memory as that of 'local_slot'.
> *
> * If that happens, local_slot now describes the replacement slot:
> * local_sync_slot_required() may have made its drop decision using
> * the replacement slot's name or invalidation state, and slot_database
> * may refer to the replacement slot's database. Thus check if
> * local_slot is still a synced slot before performing the actual drop.
> * This does not prove it is the original slot, but it prevents dropping
> * an ordinary user-created replacement slot, and the copied database OID
> * keeps lock/unlock symmetric. The remaining risk is limited to this
> * cleanup cycle, such as briefly holding an unrelated database lock, and
> * is acceptable here because this race is rare.
> */
>

Okay inspired from your and Fujii-san's version, here is a third version:
/*
* In the small window between getting the slot to drop and
* locking the database, there is a possibility of a parallel
* database drop by the startup process and the creation of a new
* slot by the user. This new user-created slot may end up using
* the same shared memory as that of 'local_slot'.
*
* Because local_slot still points to a reusable slot-array entry,
* its fields (name, database OID, invalidation state) may already
* describe such a replacement slot by the time we reach here. That
* means the drop decision made by local_sync_slot_required() above
* could have been based on the replacement slot's data, and
* slot_database could refer to an unrelated database. The recheck
* below keeps us from actually dropping a user-created replacement
* slot; the residual risk is confined to this cycle (for example,
* briefly locking an unrelated database) and is acceptable because
* the race is rare and non-fatal.
*/

Thoughts?

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Florin Irion 2026-06-19 12:10:26 Re: Fix HAVING-to-WHERE pushdown with mismatched operator families
Previous Message Henson Choi 2026-06-19 12:08:19 Re: Row pattern recognition