Quick Links

Re: Newly created replication slot may be invalidated by checkpoint

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	"Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "suyu(dot)cmj" <mengjuan(dot)cmj(at)alibaba-inc(dot)com>, tomas <tomas(at)vondra(dot)me>, michael <michael(at)paquier(dot)xyz>, "bharath(dot)rupireddyforpostgres" <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject:	Re: Newly created replication slot may be invalidated by checkpoint
Date:	2025-12-15 04:20:56
Message-ID:	CAA4eK1KhbtdypBc-T=7RaWN7TOuaBN+bojK12=mRbM0HUdJYnQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Dec 11, 2025 at 12:39 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> >
> > The other idea to fix this problem is suggested by Alexander in his
> > email [1] which is to introduce a new ReplicationSlotReserveWALLock
> > for this purpose. I think introducing LWLock in back branches could be
> > questionable. Did you evaluate the pros and cons of using that
> > approach?
>
> I reviewed that approach, and I think the main distinction lies in whether to
> use a new LWLock to serialize the process or rely on an existing lock.
> Introducing a new LWLock in back branches would alter the size of
> MainLWLockArray and affect NUM_INDIVIDUAL_LWLOCKS/LWTRANCHE_FIRST_USER_DEFINED.
> Although this may not directly impact user applications since users typically
> use standard APIs like RequestNamedLWLockTranche and LWLockNewTrancheId to add
> private LWLocks, it still has a slight risk. Additionally, using an existing
> lock could keep code similarity with the HEAD, which can be helpful for future
> bug fixes and analysis.
>

Fair enough. I'll wait for Sawada-san/Vitaly to see what their opinion
on this matter is.

> > Yet, another possibility is that we don't fix this in back branches
> > prior to 18 but not sure how frequently it can impact users. Suyu, can
> > you please tell how you found this problem in the first place? Is it
> > via code-review or did you hit this in the production or while doing
> > some related tests?
> >
> > BTW, I have asked a question regarding commit 2090edc6f32f652a2c in
> > email [2]. Did you get a chance to look at that?
>
> Please refer to the next inline reply.
>
> > + /*
> > + * Recalculate the current minimum LSN to be used in the WAL segment
> > + * cleanup. Then, we must synchronize the replication slots again in
> > + * order to make this LSN safe to use.
> > + */
> > + slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
> > + CheckPointReplicationSlots(shutdown);
> > +
> > /*
> > * Some slots have been invalidated; recalculate the old-segment
> > * horizon, starting again from RedoRecPtr.
> > */
> > XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
> > - KeepLogSeg(recptr, &_logSegNo);
> > + KeepLogSeg(recptr, slotsMinReqLSN, &_logSegNo);
> >
> >
> >
> > After invalidating the slots, we recalculate the slotsMinReqLSN with the
> > latest value of XLogGetReplicationSlotMinimumLSN(). Can't it generate a more
> > recent value of slot's restart_lsn which has not been flushed and we may end
> > up removing the corresponding WAL?
>
> Since CheckPointReplicationSlots() is immediately called after recalculating the
> slot's minimum LSN, it ensures that the dirty slot's restart_lsn is flushed to
> disk before any WAL removal takes place. So, I think only those WALs whose
> removal is based on the flushed restart_lsn value are eliminated.
>

Right. So, we can ignore this point.

--
With Regards,
Amit Kapila.

In response to

RE: Newly created replication slot may be invalidated by checkpoint at 2025-12-11 07:09:21 from Zhijie Hou (Fujitsu)

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	jian he	2025-12-15 04:37:43	Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions
Previous Message	David Rowley	2025-12-15 04:18:23	Re: Fixes a typo in tablecmd