RE: Newly created replication slot may be invalidated by checkpoint

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, suyu(dot)cmj <mengjuan(dot)cmj(at)alibaba-inc(dot)com>, tomas <tomas(at)vondra(dot)me>, michael <michael(at)paquier(dot)xyz>, bharath(dot)rupireddyforpostgres <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: RE: Newly created replication slot may be invalidated by checkpoint
Date: 2025-12-11 07:28:58
Message-ID: TY4PR01MB1690755E55B220B56911507F694A1A@TY4PR01MB16907.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thursday, December 11, 2025 3:09 PM Zhijie Hou (Fujitsu) <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > On Tuesday, December 9, 2025 7:33 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote
> >
> > On Mon, Dec 8, 2025 at 3:54 PM Zhijie Hou (Fujitsu)
> > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> > > On Monday, December 8, 2025 5:47 PM Amit Kapila
> > <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > >
> > > > On Mon, Dec 8, 2025 at 12:53 PM Masahiko Sawada
> > > > <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Fri, Dec 5, 2025 at 4:10 AM Amit Kapila
> > > > > <amit(dot)kapila16(at)gmail(dot)com>
> > > > wrote:
> > > > > >
> > > > > > On Thu, Dec 4, 2025 at 12:12 PM Zhijie Hou (Fujitsu)
> > > > > > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > > > > > >
> > > > > > > Here are the updated patches for HEAD and 18. I did not add
> > > > > > > tests since, after applying the patch and resolving the
> > > > > > > issue, the only observable behavior is that the checkpoint
> > > > > > > will wait for another backend to create a slot due to the
> > > > > > > lwlock lock, so it seems not worth to test solely lwlock wait event (I
> could not find similar tests).
> > > > > > >
> > > > > >
> > > > > > Fair enough. The patch looks mostly good to me, attached are
> > > > > > minor comment improvements atop the HEAD patch. I'll do some
> > > > > > more
> > testing
> > > > > > before push.
> > > > > >
> > > > > > Sawada-san/Vitaly, do you have any opinion on patch or the
> > > > > > direction to fix? The idea is to get this fixed for HEAD and
> > > > > > 18, then continue discussion for other bank-branches and the
> remaining patches.
> > > > >
> > > > > +1
> > > > >
> > > >
> > > > Thanks, Pushed. I'll continue thinking on how to fix it in
> > > > branches prior to
> > 18
> > > > and other problems reported in this thread.
> > >
> > > Thanks for pushing. I thought about whether it's possible to apply a
> > > similar
> > fix
> > > to back-branches and one approach could be to take
> > ReplicationSlotAllocationLock
> > > at two places. E.g., acquire an exclusive lock WAL reservation, and
> > > a shared lock during the minimum LSN calculation at checkpoints to
> > > serialize the
> > process.
> > >
> >
> > + *
> > + * Additionally, acquiring the Allocation lock to serialize the
> > + minimum LSN
> > + * calculation with concurrent slot WAL reservation. This ensures
> > + that the
> > + * WAL position being reserved is either included in the miminum LSN
> > + or is
> > + * beyond or equal to the redo pointer of the current checkpoint (See
> > + * ReplicationSlotReserveWal for details).
> > */
> > + LWLockAcquire(ReplicationSlotAllocationLock, LW_SHARED);
> > slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
> > + LWLockRelease(ReplicationSlotAllocationLock);
> >
> > Yeah, this will fix the reported issue but doesn't it look odd to take
> > an unrelated lock here? I mean it appears that if someone has to call
> > XLogGetReplicationSlotMinimumLSN(), they should acquire
> > ReplicationSlotAllocationLock in LW_SHARED mode? If we want to go in
> > this direction and don't have better ideas to fix then we should add
> > comments suggesting this is a special case and shouldn't be used as an
> > example for other places.
>
> I tried to add some comments in v10 patch.
>
> >
> > The other idea to fix this problem is suggested by Alexander in his
> > email [1] which is to introduce a new ReplicationSlotReserveWALLock
> > for this purpose. I think introducing LWLock in back branches could be
> > questionable. Did you evaluate the pros and cons of using that
> > approach?
>
> I reviewed that approach, and I think the main distinction lies in whether to
> use a new LWLock to serialize the process or rely on an existing lock.
> Introducing a new LWLock in back branches would alter the size of
> MainLWLockArray and affect
> NUM_INDIVIDUAL_LWLOCKS/LWTRANCHE_FIRST_USER_DEFINED.
> Although this may not directly impact user applications since users typically
> use standard APIs like RequestNamedLWLockTranche and
> LWLockNewTrancheId to add private LWLocks, it still has a slight risk.
> Additionally, using an existing lock could keep code similarity with the HEAD,
> which can be helpful for future bug fixes and analysis.

BTW, I searched the git history and can only find 2 old commits that adds lwlock
On stable branches, but both of are fixing serious problems such as
data corruption / loss issues.

>
> > Yet, another possibility is that we don't fix this in back branches
> > prior to 18 but not sure how frequently it can impact users. Suyu, can
> > you please tell how you found this problem in the first place? Is it
> > via code-review or did you hit this in the production or while doing
> > some related tests?
> >
> > BTW, I have asked a question regarding commit 2090edc6f32f652a2c in
> > email [2]. Did you get a chance to look at that?
>
> Please refer to the next inline reply.
>
> > + /*
> > + * Recalculate the current minimum LSN to be used in the WAL
> segment
> > + * cleanup. Then, we must synchronize the replication slots again in
> > + * order to make this LSN safe to use.
> > + */
> > + slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
> > + CheckPointReplicationSlots(shutdown);
> > +
> > /*
> > * Some slots have been invalidated; recalculate the old-segment
> > * horizon, starting again from RedoRecPtr.
> > */
> > XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
> > - KeepLogSeg(recptr, &_logSegNo);
> > + KeepLogSeg(recptr, slotsMinReqLSN, &_logSegNo);
> >
> >
> >
> > After invalidating the slots, we recalculate the slotsMinReqLSN with
> > the latest value of XLogGetReplicationSlotMinimumLSN(). Can't it
> > generate a more recent value of slot's restart_lsn which has not been
> > flushed and we may end up removing the corresponding WAL?
>
> Since CheckPointReplicationSlots() is immediately called after recalculating
> the slot's minimum LSN, it ensures that the dirty slot's restart_lsn is flushed to
> disk before any WAL removal takes place. So, I think only those WALs whose
> removal is based on the flushed restart_lsn value are eliminated.
>
> >
> > [1] - https://www.postgresql.org/message-
> >
> id/CAPpHfduZY7_pRCrbLdsLty4zP5x2EDmwk4CYiofiyjdt1iK%2BzA%40mail.gm
> > ail.com
> > [2] - https://www.postgresql.org/message-
> >
> id/CAA4eK1%2BwrNSee6PKQ0%2BDtUu_W0GdvewskpAEK5EiX6r3E%2B2Sxw
> > %40mail.gmail.com

Best Regards,
Hou zj

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2025-12-11 07:29:07 Re: [Proposal] Adding Log File Capability to pg_createsubscriber
Previous Message Bertrand Drouvot 2025-12-11 07:22:35 Re: Mark function arguments of type "T *" as "const T *" where possible