| From: | Xuneng Zhou <xunengzhou(at)gmail(dot)com> |
|---|---|
| To: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
| Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Fix race in ReplicationSlotRelease for ephemeral slots |
| Date: | 2026-06-17 07:29:09 |
| Message-ID: | CABPTF7UCqndPh8jucFtWBpFMoA2oQkSObQGXVVQNVGMZ1q-DCg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, Jun 16, 2026 at 8:46 PM Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
> On Fri, Jun 12, 2026 at 7:54 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > I feel even if there is an argument to do such a refactoring, it can
> > be done separately. We can push forward with 0001 and then do more
> > discussion for 0002, if required. I can take care of 0001 unless
> > Fujii-San wishes to take care of it?
>
> Yeah, please feel free to work on 0001.
>
> Regarding 0002, since the race is very rare and non-fatal, I'm okay
> with accepting the risk rather than adding more refactoring just to
> avoid it.
>
> I'm a bit tempted to add a source comment explaining the risk and
> why we accept it, though, so other developers can understand
> the tradeoff. For example:
>
> diff --git a/src/backend/replication/logical/slotsync.c
> b/src/backend/replication/logical/slotsync.c
> index 05637344363..ca49f20e7d9 100644
> --- a/src/backend/replication/logical/slotsync.c
> +++ b/src/backend/replication/logical/slotsync.c
> @@ -560,6 +560,12 @@ drop_local_obsolete_slots(List *remote_slot_list)
> * the same shared memory as that of
> 'local_slot'. Thus check if
> * local_slot is still the synced one before
> performing the actual
> * drop.
> + *
> + * Because local_slot still points to a
> reusable slot-array entry,
> + * fields such as name or database OID could
> already be stale here.
> + * That could cause an incorrect cleanup
> decision for this cycle or
> + * briefly lock an unrelated database. We
> accept that risk because
> + * this race is rare and non-fatal.
> */
> SpinLockAcquire(&local_slot->mutex);
> synced_slot = local_slot->in_use &&
> local_slot->data.synced;
Thanks for suggesting the comment! It helps to clarify the situation
and the trade-off we made here. I tweaked it a bit and added it to the
patches prepared by Zhijie.
--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.
| Attachment | Content-Type | Size |
|---|---|---|
| v3_PG17-0001-Avoid-stale-slot-access-after-dropping-obsol.patch | application/octet-stream | 3.6 KB |
| v3-0001-Avoid-stale-slot-access-after-dropping-obsolete-s.patch | application/octet-stream | 3.8 KB |
| v3_PG18-0001-Avoid-stale-slot-access-after-dropping-obsol.patch | application/octet-stream | 3.6 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Henson Choi | 2026-06-17 07:33:26 | Re: Row pattern recognition |
| Previous Message | Zsolt Parragi | 2026-06-17 07:23:33 | Re: Require SSL connection to postgres for oauth |