| From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
|---|---|
| To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
| Cc: | Xuneng Zhou <xunengzhou(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Fix race in ReplicationSlotRelease for ephemeral slots |
| Date: | 2026-06-16 12:45:55 |
| Message-ID: | CAHGQGwGGyEDL3dh7uJ6qPsGvnq4QK_R8+U=12CaprnzwrwaLGA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, Jun 12, 2026 at 7:54 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> I feel even if there is an argument to do such a refactoring, it can
> be done separately. We can push forward with 0001 and then do more
> discussion for 0002, if required. I can take care of 0001 unless
> Fujii-San wishes to take care of it?
Yeah, please feel free to work on 0001.
Regarding 0002, since the race is very rare and non-fatal, I'm okay
with accepting the risk rather than adding more refactoring just to
avoid it.
I'm a bit tempted to add a source comment explaining the risk and
why we accept it, though, so other developers can understand
the tradeoff. For example:
diff --git a/src/backend/replication/logical/slotsync.c
b/src/backend/replication/logical/slotsync.c
index 05637344363..ca49f20e7d9 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -560,6 +560,12 @@ drop_local_obsolete_slots(List *remote_slot_list)
* the same shared memory as that of
'local_slot'. Thus check if
* local_slot is still the synced one before
performing the actual
* drop.
+ *
+ * Because local_slot still points to a
reusable slot-array entry,
+ * fields such as name or database OID could
already be stale here.
+ * That could cause an incorrect cleanup
decision for this cycle or
+ * briefly lock an unrelated database. We
accept that risk because
+ * this race is rare and non-fatal.
*/
SpinLockAcquire(&local_slot->mutex);
synced_slot = local_slot->in_use &&
local_slot->data.synced;
Regards,
--
Fujii Masao
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Antonin Houska | 2026-06-16 12:53:02 | REPACK enhancements |
| Previous Message | Álvaro Herrera | 2026-06-16 12:44:44 | Re: Make frontend programs relink after libpgfeutils changes |