RE: Newly created replication slot may be invalidated by checkpoint

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, suyu(dot)cmj <mengjuan(dot)cmj(at)alibaba-inc(dot)com>, tomas <tomas(at)vondra(dot)me>, michael <michael(at)paquier(dot)xyz>, bharath(dot)rupireddyforpostgres <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject: RE: Newly created replication slot may be invalidated by checkpoint
Date: 2025-12-02 04:19:09
Message-ID: TY4PR01MB1690756AE2EA4EA70C7F52B7294D8A@TY4PR01MB16907.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, December 2, 2025 1:03 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Fri, Nov 21, 2025 at 12:14 AM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > OK, I think it makes sense to start separate threads.
> >
> > I have split the patches based on the different bugs they
> > address and am sharing them here for reference.
> >
>
> I'm reviewing the 0001 patch and the problem that can be addressed by
> that patch. While the proposed patch addresses the race condition
> between a checkpointing and newly created slot, could the same issue
> happen between the checkpointing and copying a slot? I'm trying to
> understand when we have to acquire ReplicationSlotAllocationLock in an
> exclusive mode in the new lock scheme.

Thanks for reviewing !

I think the situation is somewhat different in the copy_replication_slot(). As
noted in the comments[1], it's considered acceptable for WALs preceding the
initial restart_lsn to be removed since the latest restart_lsn will be copied
again in the second phase, so latest WAL being reserved is safe. Aside from this
specific case, I think it's necessary to acquire the
ReplicationSlotAllocationLock when reserving WALs for newly created slots.

[1]

/*
* We need to prevent the source slot's reserved WAL from being removed,
* but we don't want to lock that slot for very long, and it can advance
* in the meantime. So obtain the source slot's data, and create a new
* slot using its restart_lsn. Afterwards we lock the source slot again
* and verify that the data we copied (name, type) has not changed
* incompatibly. No inconvenient WAL removal can occur once the new slot
* is created -- but since WAL removal could have occurred before we
* managed to create the new slot, we advance the new slot's restart_lsn
* to the source slot's updated restart_lsn the second time we lock it.
*/

Best Regards,
Hou zj

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2025-12-02 04:59:39 Re: Segmentation fault on proc exit after dshash_find_or_insert
Previous Message Amit Langote 2025-12-02 04:10:29 Re: Segmentation fault on proc exit after dshash_find_or_insert