From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, tomas(at)vondra(dot)me |
Subject: | Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly |
Date: | 2025-05-22 21:10:27 |
Message-ID: | CAPpHfduDLCrQ=vp1KBJu5fnSzgQp9ebGRV7i2poHK9VthwaL7g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi, Vitaly!
On Tue, May 20, 2025 at 6:44 PM Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru> wrote:
>
> Thank you very much for the review!
>
> > The patchset doesn't seem to build after 371f2db8b0, which adjusted
> > the signature of the INJECTION_POINT() macro. Could you, please,
> > update the patchset accordingly.
>
> I've updated the patch (see attached). Thanks.
>
> > I see in 0004 patch we're calling XLogGetReplicationSlotMinimumLSN()
> > before slots synchronization then use it for WAL truncation.
> > Generally looks good. But what about the "if
> > (InvalidateObsoleteReplicationSlots(...))" branch? It calls
> > XLogGetReplicationSlotMinimumLSN() again. Why would the value
> > obtained from the latter call reflect slots as they are synchronized
> > to the disk?
>
> In patch 0004 I call XLogGetReplicationSlotMinimumLSN() again to keep the old
> behaviour - this function was called in KeepLogSeg prior to my change. I also
> call CheckPointReplicationSlots at the next line to save the invalidated and
> other dirty slots on disk again to make sure, the new oldest LSN is in sync.
>
> The problem I tried to solve in this if-branch is to fix test
> src/test/recovery/t/019_replslot_limit.pl which was failed because the WAL was
> not truncated enought for the test to pass ok. In general, this branch is not
> necessary and we may fix the test by calling checkpoint twice (please, see the
> alternative.rej patch for this case). If you think, we should incorporate this
> new change, I'm ok to do it. But the WAL will be truncating more lazily.
>
> Furthermore, I think we can save slots on disk right after invalidation, not in
> CheckPointGuts to avoid saving invalidated slots twice.
Thank you for the clarification. It's all good. I just missed that
CheckPointReplicationSlots() syncs slots inside the "if" branch.
I've reordered the patchset. Fix should come first, tests comes
second. So, tests pass after each commit. Also I've joined both
tests and injection points into single commit. I don't see reason to
place tests into src/test/modules, because there is no module. I've
moved them into src/test/recovery.
I also improved some comments and commit messages. I think 0001
should go to all supported releases as it fixes material bug, while
0002 should be backpatched to 17, where injection points fist appears.
0003 should go to pg19 after branching. I'm continuing reviewing
this.
------
Regards,
Alexander Korotkov
Supabase
Attachment | Content-Type | Size |
---|---|---|
v4-0001-Keep-WAL-segments-by-the-flushed-value-of-the-slo.patch | application/x-patch | 6.2 KB |
v4-0002-Add-TAP-tests-to-check-replication-slot-advance-d.patch | application/x-patch | 15.2 KB |
v4-0003-Remove-redundant-ReplicationSlotsComputeRequiredL.patch | application/x-patch | 6.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Melanie Plageman | 2025-05-22 21:18:57 | Re: Log connection establishment timings |
Previous Message | Peter Eisentraut | 2025-05-22 20:56:25 | Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part |