RE: Newly created replication slot may be invalidated by checkpoint

From: "Vitaly Davydov" <v(dot)davydov(at)postgrespro(dot)ru>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "Alexander Korotkov" <aekorotkov(at)gmail(dot)com>, "Amit Kapila" <amit(dot)kapila16(at)gmail(dot)com>
Cc: suyu(dot)cmj <mengjuan(dot)cmj(at)alibaba-inc(dot)com>, "tomas" <tomas(at)vondra(dot)me>, "michael" <michael(at)paquier(dot)xyz>, bharath(dot)rupireddyforpostgres <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Newly created replication slot may be invalidated by checkpoint
Date: 2025-11-12 15:55:14
Message-ID: 1c87ef-6914ae00-7-3c088ec0@234619058
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday, October 31, 2025 12:48 MSK, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> Firstly I considered an ad-hoc way, which sets the candidate restart_lsn as
> replicationSlotMinLSN before using as slot->data.restart_lsn. PSA the idea.
> It can fix your reproducer.

Hayato-san, thank you for the idea. Yes, it can fix the reproducer, but assume
the case when another backend calls ReplicationSlotsComputeRequiredLSN during
a new slot creation. It may reset the value which was set by
XLogMaybeSetReplicationSlotMinimumLSN. The slot may still be invalidated.

On Wednesday, November 05, 2025 13:17 MSK, Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> Thank you for highlighting this scenario. I've reviewed it. I think
> we could avoid it by covering appropriate parts of
> ReplicationSlotReserveWal() and Create{Check|Restart}Point() by a new
> LWLock. The draft patch is attached. What do you think?

Alexander, thank you for the idea to use locks. I think, the lock may fix the
problem because we can not change the redo rec ptr during wal reservation by a
slot. Once, we serialize redo rec ptr assignment and wal reservation by a slot,
we fix this issue. Unfortunately, I can not apply the proposed test due
to a deadlock when waiting for injection points.

I think how to fix it without LW locks. The problem here is that the slot
may be invalidated immediately just after restart_lsn assignment. Without the
invalidation we may re-check that the assigned restart_lsn is still acceptable
To prevent it, we may implement a two-phase reservation in ReplicationSlotReserveWal.
We assign slot->data.restart_lsn and then compare it with RedoRecPtr just to make
sure, that it not less than the RedoRecPtr at the moment of comparison. If less,
repeat the loop. If, ok, confirm the assigned restart_lsn. The invalidation
function will ignore slots with unconfirmed restart_lsns. It would be better,
if the backend will suffer, than the checkpointer. Probably, a special flag like
slot->is_creating may help.

On Monday, November 10, 2025 12:11 MSK, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> The fix seems to be only provided for bank branches, but IIUC the
> problem can happen in HEAD as well. In Head, how about acquiring
> ReplicationSlotAllocationLock in Exclusive mode during
> ReplicationSlotReserveWal? This lock is acquired in SHARE mode in
> CheckPointReplicationSlots. So, this should make our calculations
> correct and avoid invalidating the newly created slot.

Amit, I'm not sure how it may help to avoid the change of redo rec ptr during
wal reservation by a slot, because it doesn't serialize redo rec ptr assignment
and slot's wal reservation, but the core issue is in reco rec ptr change
when we reserve the wal by a slot.

On Monday, November 10, 2025 12:11 MSK, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> I feel with the proposed patches for back branches, the code is
> deviating too much and also makes it a bit complicated, which means it
> could be difficult to maintain it in the future. Can we consider
> reverting the original fix 2090edc6f32f652a2c995ca5f7e65748ae1e4c5d
> and make it the same as we did in HEAD
> ca307d5cec90a4fde62a50fafc8ab607ff1d8664? I know this would lead to
> ABI breakage, but only for extensions using sizeof(ReplicationSlot),
> if any. We can try to identify how many extensions rely on
> sizeof(ReplicationSlot) and then decide accordingly? We recently did
> something similar for another backbranch fix [1] which requires adding
> members at the end of structure.

If a decision to revert the patch is considered, I propose to consider one
more way to implement a replacement patch where last_saved_restart_lsn is
allocated in a separate array in shmem but not in ReplicationSlot struct, which
size will be unchanged. The logic will be the same except of accessing this value.
I think, I proposed such patch but it was rejected due to unwillingness to
change data allocations in the shmem. If needed, I may prepare the patch.

The attached patch is just an update of the original test. It adds the test to
the meson build file and adds one more scenario to check.

With best regards,
Vitaly

Attachment Content-Type Size
v2-Test-a-specific-case-of-physical-slot-invalidation.patch text/x-patch 7.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Timur Magomedov 2025-11-12 16:11:51 Re: [WIP]Vertical Clustered Index (columnar store extension) - take2
Previous Message Tomas Vondra 2025-11-12 15:39:50 Re: should we have a fast-path planning for OLTP starjoins?