RE: Newly created replication slot may be invalidated by checkpoint

From: "Vitaly Davydov" <v(dot)davydov(at)postgrespro(dot)ru>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "Amit Kapila" <amit(dot)kapila16(at)gmail(dot)com>
Cc: suyu(dot)cmj <mengjuan(dot)cmj(at)alibaba-inc(dot)com>, "aekorotkov" <aekorotkov(at)gmail(dot)com>, "tomas" <tomas(at)vondra(dot)me>, "michael" <michael(at)paquier(dot)xyz>, bharath(dot)rupireddyforpostgres <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Newly created replication slot may be invalidated by checkpoint
Date: 2025-09-24 14:45:36
Message-ID: 1596c1-68d40400-9-93b4080@17709609
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Amit, Hayato

On Wednesday, September 24, 2025 14:31 MSK, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:

>> I was thinking some more about this solution. Won't it lead to the
>> same problem if ReplicationSlotReserveWal() calls
>> ReplicationSlotsComputeRequiredLSN() after the above calculation of
>> checkpointer?

> Exactly. I verified that in your patch, the invalidation can still happen if
> we cannot finish the LSN computation before the KeepLogSegments().

Yes. The moment, when WAL reservation takes place is the call of
ReplicationSlotsComputeRequiredLSN which updates the oldest slots' lsn
(XLogCtl->replicationSlotMinLSN). If it occurs at the moment between KeepLogSeg
and RemoveOldXlogFiles, such reservation will not be taken into account. This
behaviour seems to be before commit 2090edc6f32f652a2c, but the probability of
such race condition was too slow due to the short time period between KeepLogSeg
and RemoveOldXlogFiles. The commit 2090edc6f32f652a2c increased the probability
of such race condition because CheckPointGuts can take greater time to execute.

The attached patch doesn't solve the original problem completely but it
decreases the probability of such race condition, as it was before the commit.
I propose to apply this patch and then to think how to resolve this race
condition, which seems to take place in 18 and master as well.

I updated the patch by improving some comments as suggested by Amit.

With best regards,
Vitaly

Attachment Content-Type Size
v2-0001-Fix-invalidation-when-slot-is-created-during-checkpo.patch text/x-patch 3.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Lakhin 2025-09-24 15:00:00 Re: GNU/Hurd portability patches
Previous Message Bertrand Drouvot 2025-09-24 14:41:23 Re: Report bytes and transactions actually sent downtream