Re: Newly created replication slot may be invalidated by checkpoint

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "suyu(dot)cmj" <mengjuan(dot)cmj(at)alibaba-inc(dot)com>, tomas <tomas(at)vondra(dot)me>, michael <michael(at)paquier(dot)xyz>, "bharath(dot)rupireddyforpostgres" <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject: Re: Newly created replication slot may be invalidated by checkpoint
Date: 2025-12-02 16:26:18
Message-ID: CAD21AoCDBfCCP7w_S+YnfW7LKNmdEAmY3gC-XP_vGYbiRRQRRQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 1, 2025 at 10:19 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Tuesday, December 2, 2025 1:03 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Fri, Nov 21, 2025 at 12:14 AM Zhijie Hou (Fujitsu)
> > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> > > OK, I think it makes sense to start separate threads.
> > >
> > > I have split the patches based on the different bugs they
> > > address and am sharing them here for reference.
> > >
> >
> > I'm reviewing the 0001 patch and the problem that can be addressed by
> > that patch. While the proposed patch addresses the race condition
> > between a checkpointing and newly created slot, could the same issue
> > happen between the checkpointing and copying a slot? I'm trying to
> > understand when we have to acquire ReplicationSlotAllocationLock in an
> > exclusive mode in the new lock scheme.
>
> Thanks for reviewing !
>
> I think the situation is somewhat different in the copy_replication_slot(). As
> noted in the comments[1], it's considered acceptable for WALs preceding the
> initial restart_lsn to be removed since the latest restart_lsn will be copied
> again in the second phase, so latest WAL being reserved is safe.

Right. But does it mean that the new slot could be invalidated while
being copied if the first copied restart_lsn becomes less than a new
redo ptr set by a concurrent checkpoint? I thought the problem the
0001 patch is trying to fix is that the slot could end up being
invalidated by a concurrent checkpoint even while being created, so I
wonder if the same problem could occur.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2025-12-02 16:31:33 Re: show size of DSAs and dshash tables in pg_dsm_registry_allocations
Previous Message Mihail Nikalayeu 2025-12-02 16:22:47 Re: Adding REPACK [concurrently]