| From: | vignesh C <vignesh21(at)gmail(dot)com> |
|---|---|
| To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
| Cc: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "suyu(dot)cmj" <mengjuan(dot)cmj(at)alibaba-inc(dot)com>, tomas <tomas(at)vondra(dot)me>, michael <michael(at)paquier(dot)xyz>, "bharath(dot)rupireddyforpostgres" <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
| Subject: | Re: Newly created replication slot may be invalidated by checkpoint |
| Date: | 2026-01-07 09:22:26 |
| Message-ID: | CALDaNm1svGKR7c8G2rh7ytgik2Cy9J8Oz7kRRUmsh7eL0w1RGw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Mon, 5 Jan 2026 at 15:01, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Dec 30, 2025 at 5:31 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Sun, Dec 14, 2025 at 8:21 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Thu, Dec 11, 2025 at 12:39 PM Zhijie Hou (Fujitsu)
> > > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > > >
> > > > >
> > > > > The other idea to fix this problem is suggested by Alexander in his
> > > > > email [1] which is to introduce a new ReplicationSlotReserveWALLock
> > > > > for this purpose. I think introducing LWLock in back branches could be
> > > > > questionable. Did you evaluate the pros and cons of using that
> > > > > approach?
> > > >
> > > > I reviewed that approach, and I think the main distinction lies in whether to
> > > > use a new LWLock to serialize the process or rely on an existing lock.
> > > > Introducing a new LWLock in back branches would alter the size of
> > > > MainLWLockArray and affect NUM_INDIVIDUAL_LWLOCKS/LWTRANCHE_FIRST_USER_DEFINED.
> > > > Although this may not directly impact user applications since users typically
> > > > use standard APIs like RequestNamedLWLockTranche and LWLockNewTrancheId to add
> > > > private LWLocks, it still has a slight risk. Additionally, using an existing
> > > > lock could keep code similarity with the HEAD, which can be helpful for future
> > > > bug fixes and analysis.
> > > >
> > >
> > > Fair enough. I'll wait for Sawada-san/Vitaly to see what their opinion
> > > on this matter is.
> >
> > While it's hacky that the proposed approach takes
> > ReplicationSlotAllocationLock before
> > XLogGetReplicationSlotMinimumLSN() during checkpoint, I find that it's
> > better than introducing a new LWLock in minor versions which could
> > lead unexpected compatibility issues.
> >
> > Regarding the v10 patch, do we need to take
> > ReplicationSlotAllocationLock also at the following place?
> >
> > /*
> > * Recalculate the current minimum LSN to be used in the WAL segment
> > * cleanup. Then, we must synchronize the replication slots again in
> > * order to make this LSN safe to use.
> > */
> > slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
> > CheckPointReplicationSlots(shutdown);
> >
>
> No, we don't need the allocation lock here because RedoRecPtr won't
> change after the previous computation so the WAL reservation during
> creation of slots won't be impacted. I mean if the slot reservation
> starts just before this computation, it should use the latest (this
> checkpoint cycle's RedoRecPtr) whereas that was not true with the case
> where the patch acquires it.
>
> > I think we need to add some comments regardless of taking the lwlock.
> >
>
> I have added a comment though not sure if it is required.
Here is a version which includes back branch version patches with
pgindent changes. I have verified the following scenario in PG17,
PG16, PG15 and PG14 branches and works as expected with the patches:
1. Start a backend to create a slot (s) but block it before updating
the slot.restart_lsn in ReplicationSlotReserveWal():
select pg_create_logical_replication_slot('s', 'test_decoding');
2. start another backend to generate some new WAL files.
select pg_switch_wal();
select pg_switch_wal();
select pg_switch_wal();
3. execute checkpoint but block it before calling
InvalidateObsoleteReplicationSlots()
4. Release the backend to create slot (s).
5. release the checkpoint. The slot will be invalidated.
The v12_PG15 patch can be used for the PG14 branch too.
Regards,
Vignesh
| Attachment | Content-Type | Size |
|---|---|---|
| v12_PG17-0001-Prevent-invalidation-of-newly-created-repli.patch | application/octet-stream | 9.2 KB |
| v12_PG15-0001-Prevent-invalidation-of-newly-created-repli.patch | application/octet-stream | 9.1 KB |
| v12_PG16-0001-Prevent-invalidation-of-newly-created-repli.patch | application/octet-stream | 9.1 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Chao Li | 2026-01-07 09:30:16 | Re: Remove deprecated role membership options from psql help for CREATE USER/GROUP |
| Previous Message | John Naylor | 2026-01-07 09:17:17 | Re: [PATCH] Documentation |