Quick Links

Re: Newly created replication slot may be invalidated by checkpoint

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	"Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "suyu(dot)cmj" <mengjuan(dot)cmj(at)alibaba-inc(dot)com>, tomas <tomas(at)vondra(dot)me>, michael <michael(at)paquier(dot)xyz>, "bharath(dot)rupireddyforpostgres" <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: Newly created replication slot may be invalidated by checkpoint
Date:	2025-12-03 20:33:39
Message-ID:	CAD21AoCMMi6o5zMUWLmt22tAd8NsLJROdy44=R_K5eadCFoK7A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Dec 2, 2025 at 10:15 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Wednesday, December 3, 2025 12:26 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Mon, Dec 1, 2025 at 10:19 PM Zhijie Hou (Fujitsu)
> > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> > > On Tuesday, December 2, 2025 1:03 AM Masahiko Sawada
> > <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > On Fri, Nov 21, 2025 at 12:14 AM Zhijie Hou (Fujitsu)
> > > > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > > > >
> > > > > OK, I think it makes sense to start separate threads.
> > > > >
> > > > > I have split the patches based on the different bugs they address
> > > > > and am sharing them here for reference.
> > > > >
> > > >
> > > > I'm reviewing the 0001 patch and the problem that can be addressed
> > > > by that patch. While the proposed patch addresses the race condition
> > > > between a checkpointing and newly created slot, could the same issue
> > > > happen between the checkpointing and copying a slot? I'm trying to
> > > > understand when we have to acquire ReplicationSlotAllocationLock in
> > > > an exclusive mode in the new lock scheme.
> > >
> > > Thanks for reviewing !
> > >
> > > I think the situation is somewhat different in the
> > > copy_replication_slot(). As noted in the comments[1], it's considered
> > > acceptable for WALs preceding the initial restart_lsn to be removed
> > > since the latest restart_lsn will be copied again in the second phase, so
> > latest WAL being reserved is safe.
> >
> > Right. But does it mean that the new slot could be invalidated while being
> > copied if the first copied restart_lsn becomes less than a new redo ptr set by a
> > concurrent checkpoint? I thought the problem the
> > 0001 patch is trying to fix is that the slot could end up being invalidated by a
> > concurrent checkpoint even while being created, so I wonder if the same
> > problem could occur.
>
> I think the invalidation cannot occur when copying because:
>
> Currently, there are no CHECK_FOR_INTERRUPTS() calls between the initial
> restart_lsn copy (first phase) and the latest restart_lsn copy (second phase).
> As a result, even if a checkpoint attempts to invalidate a slot and sends a
> SIGTERM to the backend, the backend will first update the restart_lsn during the
> second phase before responding to the signal. Consequently, during the next
> cycle of InvalidatePossiblyObsoleteSlot(), the checkpoint will observe the
> updated restart_lsn and skip the invalidation.
>
> For logical slots, although invoking the output plugin startup callback presents
> a slight chance of processing the signal (when using third-party plugins), slot
> invalidation in this scenario results in immediate slot dropping, because the
> slot is in RS_EPHEMERAL state, thus preventing invalidation.

Thank you for the analysis. I agree.

> While theoretically, slot invalidation could occur if the code changes in the
> future, addressing that possibility could be considered an independent
> improvement task. What do you think ?

Okay. I find that it also might make sense for HEAD to use
RS_EPHEMERAL state for physical slots too to avoid being invalidated
during creation, which probably can be discussed later. For back
branches, the proposed idea of acquiring ReplicationSlotAllocationLock
in an exclusive mode would be better. I think we might want to have a
comment in CheckPointReplicationSlots() too that refers to
ReplicationSlotReserveWal().

Regarding whether we revert the original fix 2090edc6f32 and make it
the same as we did in HEAD ca307d5cec90a4f, we need to change the size
of ReplicationSlot struct. I'm concerned that it's really safe to
change it because the data resides on the shared memory. For example,
we typically iterate over all replication slots as follow:

for (i = 0; i < max_replication_slots; i++)
{
ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];

I'm concerned that the arithmetic for calculating the slot address is
affected by the size of ReplicationSlot change.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

RE: Newly created replication slot may be invalidated by checkpoint at 2025-12-03 06:15:32 from Zhijie Hou (Fujitsu)

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Nathan Bossart	2025-12-03 20:36:30	Re: [PATCH] Add enable_copy_program GUC to control COPY PROGRAM
Previous Message	Sami Imseih	2025-12-03 20:27:29	Re: Proposal: Add a callback data parameter to GetNamedDSMSegment