RE: Newly created replication slot may be invalidated by checkpoint

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>, suyu(dot)cmj <mengjuan(dot)cmj(at)alibaba-inc(dot)com>, tomas <tomas(at)vondra(dot)me>, michael <michael(at)paquier(dot)xyz>, bharath(dot)rupireddyforpostgres <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Newly created replication slot may be invalidated by checkpoint
Date: 2025-11-11 03:36:32
Message-ID: OSCPR01MB149669889E3F3E384743D4B28F5CFA@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Amit, Alexander,

> The fix seems to be only provided for bank branches, but IIUC the
> problem can happen in HEAD as well.

Yes, I confirmed this can happen. [1] can be applied atop HEAD and
reproduce the invalidation.

> In Head, how about acquiring
> ReplicationSlotAllocationLock in Exclusive mode during
> ReplicationSlotReserveWal? This lock is acquired in SHARE mode in
> CheckPointReplicationSlots. So, this should make our calculations
> correct and avoid invalidating the newly created slot.

I feel this can fix the issue. The idea can serialize CheckpointReplicationSlots()
and ReplicationSlotReserveWal(), and upcoming XLogGetReplicationSlotMinimumLSN()
can take care the newly created slot.

Also, since CheckpointReplicationSlots() is called after setting RedoRecPtr, it
is OK if ReplicationSlotReserveWal() is called after the CheckpointReplicationSlots().
In this case the candidate restart_lsn has larger LSN than RedoRecPtr and won't
be removed in the checkpoint.

> I feel with the proposed patches for back branches, the code is
> deviating too much and also makes it a bit complicated, which means it
> could be difficult to maintain it in the future. Can we consider
> reverting the original fix 2090edc6f32f652a2c995ca5f7e65748ae1e4c5d
> and make it the same as we did in HEAD
> ca307d5cec90a4fde62a50fafc8ab607ff1d8664?

Is it allowed to add new LWLocks for backbranched? I'm afraid some other extensions
might be affected.

[1]: https://www.postgresql.org/message-id/30938d-68ffa500-b-6328cc00%40139466859

Best regards,
Hayato Kuroda
FUJITSU LIMITED

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ajin Cherian 2025-11-11 03:37:01 Re: Add support for COPY TO in tablesync for partitioned tables.
Previous Message Amit Kapila 2025-11-11 03:32:41 Re: Logical Replication of sequences