From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
Cc: | Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de> |
Subject: | Re: Incorrect logic in XLogNeedsFlush() |
Date: | 2025-09-10 23:18:13 |
Message-ID: | aMIHNRTP6Wj6vw1s@paquier.xyz |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Sep 10, 2025 at 11:12:37AM -0400, Melanie Plageman wrote:
> On Wed, Sep 10, 2025 at 3:58 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>> You have a good point here, especially knowing that for
>> CHECKPOINT_END_OF_RECOVERY case in CreateCheckPoint(), called by the
>> checkpointer, we allow WAL inserts with LocalSetXLogInsertAllowed().
>> So, relying on RecoveryInProgress() in this context leads to a wrong
>> result with XLogNeedsFlush(): WAL inserts are allowed, the minimum
>> recovery point is not relevant for the evaluation.
>
> So, would you consider the defining characteristic of whether or not
> we should use the flush pointer instead of min recovery point in
> XLogNeedsFlush() to be whether or not WAL inserts are allowed?
Yeah, using the flush pointer makes sense if WAL inserts are allowed,
even if that's just temporary. We're telling the process where the
inserts are allowed that we are not officially out of recovery yet,
but close to it. If your suggestion is to update XLogNeedsFlush() so
as it uses !XLogInsertAllowed() rather than RecoveryInProgress(), that
makes sense to me. It looks like we've never really had a case up to
now where XLogNeedsFlush() would matter for the processes where
LocalSetXLogInsertAllowed() has been called to temporarily allow WAL
inserts.
LocalSetXLogInsertAllowed() is not a one-way trip as far as I recall,
with the checkpointer being a special case.
> In XLogFlush() -> UpdateMinRecoveryPoint() (called after checking if
> WAL inserts are allowed), it has similar logic to XLogNeedsFlush():
>
> if (!updateMinRecoveryPoint || (!force && lsn <= LocalMinRecoveryPoint))
> return;
> if (XLogRecPtrIsInvalid(LocalMinRecoveryPoint) && InRecovery)
> updateMinRecoveryPoint = false;
>
> The order of these two if statements is reversed in XLogNeedsFlush()
You mean in XLogNeedsFlush()->RecoveryInProgress() for the first set
of checks, and the top of UpdateMinRecoveryPoint() for the second set
of similar checks. Alexander Kukushkin may have some test cases
fancier than the in-core tests to stress this code path, still +1
for more consistency.
> So, what I don't understand is: why it would be okay for processes
> other than the startup process to read the minimum recovery point from
> the control file during crash recovery before it is valid.
InvalidXLogRecPtr stored in ControlFile->minRecoveryPoint can be used
as a way to check if we are running crash recovery. I don't see why
we could not reverse the order of the checks to make both code paths
more consistent. Cannot think of a reason on top of my mind.
> It seems in crash recovery, a backend other than the startup process
> will fail this test:
>
> if (XLogRecPtrIsInvalid(LocalMinRecoveryPoint) && InRecovery)
Yes. InRecovery can only ever be true in the startup process.
Anything else will fail this test.
FYI, looking a bit around with the history of XLogNeedsFlush(), the
last changes were a few years ago, with c186ba135eca and 8d68ee6f31ca
(cough):
https://www.postgresql.org/message-id/20180831.145206.05203037.horiguchi.kyotaro@lab.ntt.co.jp
The only process calling UpdateMinRecoveryPoint() during crash
recovery should be the startup process, and I don't recall that we've
changed anything in this area lately to change this rule.
> There is a comment right after that says
>
> /*
> * Check minRecoveryPoint for any other process than the startup
> * process doing crash recovery, which should not update the control
> * file value if crash recovery is still running.
> */
>
> Which seems to say that it is okay for other processes than the
> startup process to read the minimum recovery point from the Control
> File during crash recovery. But I don't quite understand why.
minRecoveryPoint as an invalid LSN can allow another process to know
if we are in crash recovery. These days one could just use
GetRecoveryState() if they'd wish to know the state of recovery we are
in based on how much the startup process has replayed. I suspect that
this code could be simplified to rely more on GetRecoveryState(),
instead. This API has been introduced later than the recent changes
of XLogNeedsFlush(), as in 4e87c4836ab9 vs c186ba135eca. You are
right that there are some simplications that can be done here. That
looks like an independent list of patches, at least three:
1) The order of the sanity checks.
2) Potential simplifications in XLogNeedsFlush() with
GetRecoveryState().
3) Switch XLogNeedsFlush() to use !XLogInsertAllowed() rather than
RecoveryInProgress() that discard the local-wal-insert case, with the
addition of an extra assertion. bufmgr.c would be one location, I am
wondering if we should not plant that elsewhere, like in XLogFlush()
itself. There should be no risk of an infinite recursion, if my
reading of the code is right this morning.
Anyway, yes, it really comes down to the point that we've never
bothered using XLogNeedsFlush() to support your case. 3) would be
enough for your case, the other two seem like nice improvements to
make this part of the recovery code leaner.
>> Or are you worried about the case of GetVictimBuffer() where a second
>> call of XLogNeedsFlush() lives?
>
> I want it to be fixed because of another patch I am writing to do
> batched writes with smgrwritev() [1]. In this case, I've split up the
> code in FlushBuffer() and flush the WAL to the required LSN in a
> different place in code than where I call smgrwritev(). So, I want to
> add an assert-only verification that checks that none of the buffer's
> need WAL flushed. This would be invoked by any process calling
> FlushBuffer(), so it would have to work for all types of backends.
Adding an assertion based on needs-flush sounds like a good idea in
the context of what you are doing in the other patch, yes. Recovery
test 015 would fail as well in your case, right?
--
Michael
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2025-09-10 23:21:21 | Re: Incorrect logic in XLogNeedsFlush() |
Previous Message | Thomas Munro | 2025-09-10 22:59:19 | Re: VM corruption on standby |