| From: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
|---|---|
| To: | Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
| Cc: | suyu(dot)cmj <mengjuan(dot)cmj(at)alibaba-inc(dot)com>, tomas <tomas(at)vondra(dot)me>, michael <michael(at)paquier(dot)xyz>, bharath(dot)rupireddyforpostgres <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
| Subject: | RE: Newly created replication slot may be invalidated by checkpoint |
| Date: | 2025-11-13 02:33:38 |
| Message-ID: | TY4PR01MB16907FA10BB27ABC97345065094CDA@TY4PR01MB16907.jpnprd01.prod.outlook.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wednesday, November 12, 2025 11:55 PMVitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru> wrote:
>
> On Monday, November 10, 2025 12:11 MSK, Amit Kapila
> <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > The fix seems to be only provided for bank branches, but IIUC the
> > problem can happen in HEAD as well. In Head, how about acquiring
> > ReplicationSlotAllocationLock in Exclusive mode during
> > ReplicationSlotReserveWal? This lock is acquired in SHARE mode in
> > CheckPointReplicationSlots. So, this should make our calculations
> > correct and avoid invalidating the newly created slot.
>
> Amit, I'm not sure how it may help to avoid the change of redo rec ptr during
> wal reservation by a slot, because it doesn't serialize redo rec ptr assignment
> and slot's wal reservation, but the core issue is in reco rec ptr change when we
> reserve the wal by a slot.
I think the main issue here lies in the possibility that the minimum restart_lsn
obtained during a checkpoint could be less than the WAL position that is being
reserved concurrently. So, instead of serializing the redo assignment and WAL
reservation, Amit proposes serializing the CheckPointReplicationSlots() and WAL
reservation. This would ensure the following:
1) If the WAL reservation occurs first, the checkpoint must wait for the
restart_lsn to be updated before proceeding with WAL removal. This guarantees
that the most recent restart_lsn position is detected.
2) If the checkpoint calls CheckPointReplicationSlots() first, then any
subsequent WAL reservation must take a position later than the redo pointer.
I find this approach promising, and thus, I am sharing a POC for this approach on
HEAD. (I will keep testing the patch locally)
BTW, I am not adding a test using an injection point because, once we acquire a
lwlock in the WAL reserve function, it becomes impractical to insert
an injection point there. The reason is that the injection point function
internally calls CHECK_FOR_INTERRUPTS(), and the lightweight lock holds
interrupts, preventing their execution.
>
> On Monday, November 10, 2025 12:11 MSK, Amit Kapila
> <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > I feel with the proposed patches for back branches, the code is
> > deviating too much and also makes it a bit complicated, which means it
> > could be difficult to maintain it in the future. Can we consider
> > reverting the original fix 2090edc6f32f652a2c995ca5f7e65748ae1e4c5d
> > and make it the same as we did in HEAD
> > ca307d5cec90a4fde62a50fafc8ab607ff1d8664? I know this would lead to
> > ABI breakage, but only for extensions using sizeof(ReplicationSlot),
> > if any. We can try to identify how many extensions rely on
> > sizeof(ReplicationSlot) and then decide accordingly? We recently did
> > something similar for another backbranch fix [1] which requires adding
> > members at the end of structure.
>
> If a decision to revert the patch is considered, I propose to consider one more
> way to implement a replacement patch where last_saved_restart_lsn is
> allocated in a separate array in shmem but not in ReplicationSlot struct, which
> size will be unchanged. The logic will be the same except of accessing this
> value.
> I think, I proposed such patch but it was rejected due to unwillingness to
> change data allocations in the shmem. If needed, I may prepare the patch.
Personally, allocating additional shared memory space seems somewhat hacky. I
prefer aligning the code with the current HEAD, as Amit suggested, to ensure
consistency and facilitate easier maintenance in the future. I can prepare
patches for back branches as well once an agreement is reached.
Best Regards,
Hou zj
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-Fix-unexpected-WAL-removal.patch | application/octet-stream | 3.4 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Manni Wood | 2025-11-13 02:40:35 | Re: Speed up COPY FROM text/CSV parsing using SIMD |
| Previous Message | Chao Li | 2025-11-13 02:20:30 | Re: Suggestion to add --continue-client-on-abort option to pgbench |