| From: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
|---|---|
| To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
| Cc: | Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, suyu(dot)cmj <mengjuan(dot)cmj(at)alibaba-inc(dot)com>, tomas <tomas(at)vondra(dot)me>, michael <michael(at)paquier(dot)xyz>, bharath(dot)rupireddyforpostgres <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
| Subject: | RE: Newly created replication slot may be invalidated by checkpoint |
| Date: | 2025-12-11 07:09:21 |
| Message-ID: | TY4PR01MB169074BB06FEE03F4E15F4BA494A1A@TY4PR01MB16907.jpnprd01.prod.outlook.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> -----Original Message-----
> From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> Sent: Tuesday, December 9, 2025 7:33 PM
> To: Hou, Zhijie/侯 志杰 <houzj(dot)fnst(at)fujitsu(dot)com>
> Cc: Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>; pgsql-
> hackers(at)lists(dot)postgresql(dot)org; suyu.cmj <mengjuan(dot)cmj(at)alibaba-inc(dot)com>;
> tomas <tomas(at)vondra(dot)me>; michael <michael(at)paquier(dot)xyz>;
> bharath.rupireddyforpostgres <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>;
> Alexander Korotkov <aekorotkov(at)gmail(dot)com>; Masahiko Sawada
> <sawada(dot)mshk(at)gmail(dot)com>
> Subject: Re: Newly created replication slot may be invalidated by checkpoint
>
> On Mon, Dec 8, 2025 at 3:54 PM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > On Monday, December 8, 2025 5:47 PM Amit Kapila
> <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Mon, Dec 8, 2025 at 12:53 PM Masahiko Sawada
> > > <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > On Fri, Dec 5, 2025 at 4:10 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > > wrote:
> > > > >
> > > > > On Thu, Dec 4, 2025 at 12:12 PM Zhijie Hou (Fujitsu)
> > > > > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > > > > >
> > > > > > Here are the updated patches for HEAD and 18. I did not add tests
> > > > > > since, after applying the patch and resolving the issue, the only
> > > > > > observable behavior is that the checkpoint will wait for another
> > > > > > backend to create a slot due to the lwlock lock, so it seems not
> > > > > > worth to test solely lwlock wait event (I could not find similar tests).
> > > > > >
> > > > >
> > > > > Fair enough. The patch looks mostly good to me, attached are minor
> > > > > comment improvements atop the HEAD patch. I'll do some more
> testing
> > > > > before push.
> > > > >
> > > > > Sawada-san/Vitaly, do you have any opinion on patch or the direction
> > > > > to fix? The idea is to get this fixed for HEAD and 18, then continue
> > > > > discussion for other bank-branches and the remaining patches.
> > > >
> > > > +1
> > > >
> > >
> > > Thanks, Pushed. I'll continue thinking on how to fix it in branches prior to
> 18
> > > and other problems reported in this thread.
> >
> > Thanks for pushing. I thought about whether it's possible to apply a similar
> fix
> > to back-branches and one approach could be to take
> ReplicationSlotAllocationLock
> > at two places. E.g., acquire an exclusive lock WAL reservation, and a shared
> > lock during the minimum LSN calculation at checkpoints to serialize the
> process.
> >
>
> + *
> + * Additionally, acquiring the Allocation lock to serialize the minimum LSN
> + * calculation with concurrent slot WAL reservation. This ensures that the
> + * WAL position being reserved is either included in the miminum LSN or is
> + * beyond or equal to the redo pointer of the current checkpoint (See
> + * ReplicationSlotReserveWal for details).
> */
> + LWLockAcquire(ReplicationSlotAllocationLock, LW_SHARED);
> slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
> + LWLockRelease(ReplicationSlotAllocationLock);
>
> Yeah, this will fix the reported issue but doesn't it look odd to take
> an unrelated lock here? I mean it appears that if someone has to call
> XLogGetReplicationSlotMinimumLSN(), they should acquire
> ReplicationSlotAllocationLock in LW_SHARED mode? If we want to go in
> this direction and don't have better ideas to fix then we should add
> comments suggesting this is a special case and shouldn't be used as an
> example for other places.
I tried to add some comments in v10 patch.
>
> The other idea to fix this problem is suggested by Alexander in his
> email [1] which is to introduce a new ReplicationSlotReserveWALLock
> for this purpose. I think introducing LWLock in back branches could be
> questionable. Did you evaluate the pros and cons of using that
> approach?
I reviewed that approach, and I think the main distinction lies in whether to
use a new LWLock to serialize the process or rely on an existing lock.
Introducing a new LWLock in back branches would alter the size of
MainLWLockArray and affect NUM_INDIVIDUAL_LWLOCKS/LWTRANCHE_FIRST_USER_DEFINED.
Although this may not directly impact user applications since users typically
use standard APIs like RequestNamedLWLockTranche and LWLockNewTrancheId to add
private LWLocks, it still has a slight risk. Additionally, using an existing
lock could keep code similarity with the HEAD, which can be helpful for future
bug fixes and analysis.
> Yet, another possibility is that we don't fix this in back branches
> prior to 18 but not sure how frequently it can impact users. Suyu, can
> you please tell how you found this problem in the first place? Is it
> via code-review or did you hit this in the production or while doing
> some related tests?
>
> BTW, I have asked a question regarding commit 2090edc6f32f652a2c in
> email [2]. Did you get a chance to look at that?
Please refer to the next inline reply.
> + /*
> + * Recalculate the current minimum LSN to be used in the WAL segment
> + * cleanup. Then, we must synchronize the replication slots again in
> + * order to make this LSN safe to use.
> + */
> + slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
> + CheckPointReplicationSlots(shutdown);
> +
> /*
> * Some slots have been invalidated; recalculate the old-segment
> * horizon, starting again from RedoRecPtr.
> */
> XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
> - KeepLogSeg(recptr, &_logSegNo);
> + KeepLogSeg(recptr, slotsMinReqLSN, &_logSegNo);
>
>
>
> After invalidating the slots, we recalculate the slotsMinReqLSN with the
> latest value of XLogGetReplicationSlotMinimumLSN(). Can't it generate a more
> recent value of slot's restart_lsn which has not been flushed and we may end
> up removing the corresponding WAL?
Since CheckPointReplicationSlots() is immediately called after recalculating the
slot's minimum LSN, it ensures that the dirty slot's restart_lsn is flushed to
disk before any WAL removal takes place. So, I think only those WALs whose
removal is based on the flushed restart_lsn value are eliminated.
>
> [1] - https://www.postgresql.org/message-
> id/CAPpHfduZY7_pRCrbLdsLty4zP5x2EDmwk4CYiofiyjdt1iK%2BzA%40mail.gm
> ail.com
> [2] - https://www.postgresql.org/message-
> id/CAA4eK1%2BwrNSee6PKQ0%2BDtUu_W0GdvewskpAEK5EiX6r3E%2B2Sxw
> %40mail.gmail.com
Best Regards,
Hou zj
| Attachment | Content-Type | Size |
|---|---|---|
| v10_PG17-0001-Prevent-invalidation-of-newly-created-repli.patch | application/octet-stream | 8.0 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Xueyu Gao | 2025-12-11 07:13:40 | Re:Re: Propose: Adding a '--enable-failover' option to 'pg_createsubscriber' ver.3 |
| Previous Message | Ioseph Kim | 2025-12-11 06:39:16 | Re: Propose: Adding a '--enable-failover' option to 'pg_createsubscriber' ver.3 |