Re: Incorrect logic in XLogNeedsFlush()

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Incorrect logic in XLogNeedsFlush()
Date: 2025-09-14 08:09:51
Message-ID: CAFiTN-tK9n4J=oGsRvYjVydNDarSa=5AgjOghjdhBqGSfiR3VQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 12, 2025 at 10:02 PM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> On Fri, 2025-09-12 at 14:04 +0900, Michael Paquier wrote:
> > On Fri, Sep 12, 2025 at 10:21:27AM +0530, Dilip Kumar wrote:
> > > Yeah, asserting it at the end makes sense, as we can ensure that
> > > XLogFlush() and XLogNeedsFlush() agree on the same thing without
> > > adding additional overhead. Here is the revised one.
> >
> > Melanie and Jeff, what do you think?
>
> IIUC: during the end-of-recovery checkpoint, a hypothetical call to
> XLogNeedsFlush() would have incorrectly used the LocalMinRecoveryPoint;
> and with the change, it would correctly use the flush pointer.
>
> That wasn't a live bug because XLogNeedsFlush() was never actually
> called during the end-of-recovery checkpoint. CreateCheckPoint()
> called XLogFlush() directly, which correctly checked the flush pointer.
> But, hypothetically, if that code had logic based around
> XLogNeedsFlush() before calling XLogFlush(), that would have been a
> bug.
>
> So this fix has no behavior change, but it would make the code more
> resilient against changes to CreateCheckPoint(), more consistent, and
> less confusing. Is that right?

That's right.

> Assuming I'm right so far, then I agree with changing the test in
> XLogNeedsFlush() to !XLogInsertAllowed(), as the patch does.
>
> The comment in the patch is describing what's happening, but lost the
> "why". Perhaps something like:
>
> "During recovery, we don't flush WAL but update minRecoveryPoint
> instead. So "needs flush" is taken to mean whether minRecoveryPoint
> would need to be updated. The end-of-recovery checkpoint still must
> flush WAL like normal, though, so check !XLogInsertAllowed(). This
> check must be consistent with XLogFlush()."

Updated as per suggestion

> The commit message is vague about whether there's a live bug or not; I
> suggest clarifying that the inconsistency between the two functions
> could have been a bug but wasn't. Also, I generally use past tense for
> the stuff that's being fixed and present tense for what the patch does.

I tried to improve it in v2

> One loose end: there was some discussion about the times when
> LocalMinRecoveryPoint is valid/correct. I'm not sure I entirely
> understood -- is that no longer a concern?

IMHO during crash recovery LocalMinRecoveryPoint and
ControlFile->minRecoveryPoint can not initialized until we replay all
WAL so those should never get accessed. However if XLogNeedsFlush()
were called during the end of the recovery checkpoint it was accessing
this which was wrong although it was not a live bug and this patch is
making that more resilient. OTOH if we are creating a restartpoint
during standby replay or archive recovery then we do access
LocalMinRecoveryPoint and ControlFile->minRecoveryPoint because by
that time we have already done the minimum recovery require for
reaching to the consistent state and we would have initialized
ControlFile->minRecoveryPoint.

IMHO during crash recovery, the variables LocalMinRecoveryPoint and
ControlFile->minRecoveryPoint remain uninitialized until all required
WAL has been replayed. Consequently, they must not be accessed during
this phase. Previously, calling XLogNeedsFlush() during the "end of
recovery checkpoint" incorrectly accessed these uninitialized values.
While this was not a live bug, this patch hardens the code against
this type of race condition.

OTOH, when creating a RestartPoint during standby replay or doing
archive recovery, access to both LocalMinRecoveryPoint and
ControlFile->minRecoveryPoint is correct. By this stage, it has
completed the minimum recovery required to achieve a consistent state,
and ControlFile->minRecoveryPoint would have been properly
initialized.

--
Regards,
Dilip Kumar
Google

Attachment Content-Type Size
v2-0001-Make-XLogFlush-and-XLogNeedsFlush-decision-more-c.patch application/octet-stream 2.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikolay Shaplov 2025-09-14 08:49:34 Re: [PATCH] ternary reloption type
Previous Message Florian Weimer 2025-09-14 07:49:04 Ignoring symlinks when recovering time zone identifiers in initdb