Re: Newly created replication slot may be invalidated by checkpoint

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "suyu(dot)cmj" <mengjuan(dot)cmj(at)alibaba-inc(dot)com>, tomas <tomas(at)vondra(dot)me>, michael <michael(at)paquier(dot)xyz>, "bharath(dot)rupireddyforpostgres" <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Newly created replication slot may be invalidated by checkpoint
Date: 2025-12-18 11:57:46
Message-ID: CAPpHfdv3t1r7ea7omH=AyM0mRW3-i2A80is-VyLjrO2v0cv79Q@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 10, 2025 at 11:11 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Wed, Nov 5, 2025 at 3:48 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> > On Mon, Oct 6, 2025 at 6:46 PM Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru> wrote:
> > > There is one subtle thing. Once, the operation of restart_lsn assignment is not
> > > an atomic, the following scenario may happen theoretically:
> > > 1. Read GetRedoRecPtr() in the backend (ReplicationSlotReserveWal)
> > > 2. Assign a new redo LSN in the checkpointer
> > > 3. Call ReplicationSlotsComputeRequiredLSN() in the checkpointer
> > > 3. Assign the old redo LSN to restart_lsn
> > >
> > > In this scenario, the restart_lsn will point to a previous redo LSN and it will
> > > be not protected by the new redo LSN. This scenario is unlikely, but it can
> > > happen theoretically. I have no ideas how to deal with it, except of assigning
> > > restart_lsn under XLogCtl->info_lck lock to avoid concurrent modification of
> > > XLogCtl->RecoRecPtr until it is assigned to restart_lsn of a creating slot.
> > >
> > > In case of recovery, when GetXLogReplayRecPtr() is used, the protection by
> > > redo LSN seems to work as well, because a new redo LSN is taken from the latest
> > > replayed checkpoint. Thus, it is guaranteed that GetXLogReplayRecPtr() will not
> > > be less than the new redo LSN, if it is called right after assignment of redo
> > > LSN in CreateRestartPoint().
> >
> > Thank you for highlighting this scenario. I've reviewed it. I think
> > we could avoid it by covering appropriate parts of
> > ReplicationSlotReserveWal() and Create{Check|Restart}Point() by a new
> > LWLock. The draft patch is attached. What do you think?
> >
>
> The fix seems to be only provided for bank branches, but IIUC the
> problem can happen in HEAD as well. In Head, how about acquiring
> ReplicationSlotAllocationLock in Exclusive mode during
> ReplicationSlotReserveWal? This lock is acquired in SHARE mode in
> CheckPointReplicationSlots. So, this should make our calculations
> correct and avoid invalidating the newly created slot.
>
> I feel with the proposed patches for back branches, the code is
> deviating too much and also makes it a bit complicated, which means it
> could be difficult to maintain it in the future. Can we consider
> reverting the original fix 2090edc6f32f652a2c995ca5f7e65748ae1e4c5d
> and make it the same as we did in HEAD
> ca307d5cec90a4fde62a50fafc8ab607ff1d8664? I know this would lead to
> ABI breakage, but only for extensions using sizeof(ReplicationSlot),
> if any. We can try to identify how many extensions rely on
> sizeof(ReplicationSlot) and then decide accordingly? We recently did
> something similar for another backbranch fix [1] which requires adding
> members at the end of structure.

I did some research in this area. There are extensions, which access
ReplicationSlotCtl->replication_slots[], which is sensitive to the
sizeof(ReplicationSlot). Some examples from github include.

pglogical fork from Adjust
https://github.com/adjust/pglogical/blob/b65b21c940b0eec8e49ec55a3e07521615485250/pglogical_functions.c#L248

pg_failover_slots from EDB
https://github.com/EnterpriseDB/pg_failover_slots/blob/master/pg_failover_slots.c#L1229

spock logical replication from pgEdge
https://github.com/pgEdge/spock/blob/3cf97588eab7c6ecb1f117d7349bce9d779a28a8/src/spock_monitoring.c#L79

pg_squeeze from Cybertec
https://github.com/cybertec-postgresql/pg_squeeze/blob/66ac647c0a2b5ff716aabc055886e47c4cfa3baf/worker.c#L1773

pgactive from AWS
https://github.com/aws/pgactive/blob/6c3ccc77531ed03f2051fb6a9e7ad532cd223545/src/pgactive_perdb.c#L318

There are probably much more. Here is the query I used for the search.
https://github.com/search?q=%22ReplicationSlotCtl-%3Ereplication_slots%22+language%3AC+NOT+repo%3Apostgres%2Fpostgres+NOT+fork%3Apostgres%2Fpostgres&type=code

I think this is enough to give up on idea to change
sizeof(ReplicationSlot) on back branches.

------
Regards,
Alexander Korotkov
Supabase

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2025-12-18 12:00:12 Re: Logical Replication of sequences
Previous Message Aya Iwata (Fujitsu) 2025-12-18 11:47:31 RE: [PROPOSAL] Termination of Background Workers for ALTER/DROP DATABASE