Quick Links

Re: Incorrect logic in XLogNeedsFlush()

From:	Melanie Plageman <melanieplageman(at)gmail(dot)com>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject:	Re: Incorrect logic in XLogNeedsFlush()
Date:	2025-09-10 15:12:37
Message-ID:	CAAKRu_bK+O56ZtoaZV0pjTK=VffMLyApFaKpX+mmkx8ichBaYg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Sep 10, 2025 at 3:58 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Wed, Sep 10, 2025 at 12:48:21PM +0530, Dilip Kumar wrote:
> >> So, looking at the code, it seems like most places we examine
> >> ControlFile->minRecoveryPoint, we condition it on "InArchiveRecovery".
> >> Is this something we want to do in XLogNeedsFlush() to avoid reading
> >> from ControlFile->minRecoveryPoint()?
>
> How would that help when the checkpointer does XLogNeedsFlush() calls,
> which is where you are reporting the disturbance is during the
> end-of-recovery checkpoint? InArchiveRecovery is only true in the
> startup process, set when we are doing archive recovery. This can be
> switched from the beginning of recovery or later, once all the local
> WAL has been replayed.

Ah right, I made the same mistake with the InRecovery global variable
initially too. This is my first time reading this code. I thought that
there needed to be something guarding processes other than the startup
process from reading the minimum recovery point from the ControlFile
during crash recovery until it is valid. But perhaps not -- more on
that below.

> In the case of 015_promotion_pages, I can see that when the check you
> are adding fails, the LSN is already flushed by XLogFLush(). So,
> while I don't see any active bug here, I tend to agree with your
> position that XLogNeedsFlush() could be improved in the context of the
> end-of-recovery checkpoint or just when a process is allowed WAL
> inserts while we are still in recovery.
>
> You have a good point here, especially knowing that for
> CHECKPOINT_END_OF_RECOVERY case in CreateCheckPoint(), called by the
> checkpointer, we allow WAL inserts with LocalSetXLogInsertAllowed().
> So, relying on RecoveryInProgress() in this context leads to a wrong
> result with XLogNeedsFlush(): WAL inserts are allowed, the minimum
> recovery point is not relevant for the evaluation.

So, would you consider the defining characteristic of whether or not
we should use the flush pointer instead of min recovery point in
XLogNeedsFlush() to be whether or not WAL inserts are allowed?

e.g.

@@ -3115,7 +3115,7 @@ XLogNeedsFlush(XLogRecPtr record)
* instead. So "needs flush" is taken to mean whether minRecoveryPoint
* would need to be updated.
*/
- if (RecoveryInProgress())
+ if (RecoveryInProgress() && !XLogInsertAllowed())

If I look at the difference between XLogFlush() and XLogNeedsFlush(),
it checks if XLogInsertAllowed() and, if not, does not try to flush
WAL -- which makes sense. There shouldn't be WAL to flush if we aren't
allowed to insert any.

In XLogFlush() -> UpdateMinRecoveryPoint() (called after checking if
WAL inserts are allowed), it has similar logic to XLogNeedsFlush():

if (!updateMinRecoveryPoint || (!force && lsn <= LocalMinRecoveryPoint))
return;
if (XLogRecPtrIsInvalid(LocalMinRecoveryPoint) && InRecovery)
updateMinRecoveryPoint = false;

The order of these two if statements is reversed in XLogNeedsFlush()

So, what I don't understand is: why it would be okay for processes
other than the startup process to read the minimum recovery point from
the control file during crash recovery before it is valid.

It seems in crash recovery, a backend other than the startup process
will fail this test:

if (XLogRecPtrIsInvalid(LocalMinRecoveryPoint) && InRecovery)

and then potentially read from the control file

LocalMinRecoveryPoint = ControlFile->minRecoveryPoint;
LocalMinRecoveryPointTLI = ControlFile->minRecoveryPointTLI;

There is a comment right after that says

/*
* Check minRecoveryPoint for any other process than the startup
* process doing crash recovery, which should not update the control
* file value if crash recovery is still running.
*/

Which seems to say that it is okay for other processes than the
startup process to read the minimum recovery point from the Control
File during crash recovery. But I don't quite understand why.

> Or are you worried about the case of GetVictimBuffer() where a second
> call of XLogNeedsFlush() lives?

I want it to be fixed because of another patch I am writing to do
batched writes with smgrwritev() [1]. In this case, I've split up the
code in FlushBuffer() and flush the WAL to the required LSN in a
different place in code than where I call smgrwritev(). So, I want to
add an assert-only verification that checks that none of the buffer's
need WAL flushed. This would be invoked by any process calling
FlushBuffer(), so it would have to work for all types of backends.

- Melanie

[1] https://www.postgresql.org/message-id/flat/CAAKRu_Yurj0sabXZB5EVqO3fKgwN1vELe1H4e0s1zXV3MKbZjQ%40mail.gmail.com#c8caa5ff77d89e75b6f56f16619a77b7

In response to

Re: Incorrect logic in XLogNeedsFlush() at 2025-09-10 07:58:37 from Michael Paquier

Responses

Re: Incorrect logic in XLogNeedsFlush() at 2025-09-10 16:58:08 from Jeff Davis
Re: Incorrect logic in XLogNeedsFlush() at 2025-09-10 23:18:13 from Michael Paquier

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jacob Champion	2025-09-10 16:02:25	Re: OAuth client code doesn't work with Google OAuth
Previous Message	Melanie Plageman	2025-09-10 15:12:11	Re: Incorrect logic in XLogNeedsFlush()