Quick Links

Re: Incorrect logic in XLogNeedsFlush()

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc:	Melanie Plageman <melanieplageman(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject:	Re: Incorrect logic in XLogNeedsFlush()
Date:	2025-09-17 22:50:06
Message-ID:	aMs7Hjkjt-AApPe5@paquier.xyz
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Sep 17, 2025 at 07:05:32PM +0530, Dilip Kumar wrote:
> In a recovery scenario, the ControlFile->minRecoveryPoint on a standby
> server is continuously updated. This ensures that even in the event of
> a crash, a valid recovery point is available. However, if a crashed
> standalone server is started as a standby, the
> ControlFile->minRecoveryPoint is initially unset, and this will be set
> in ReadRecord()->SwitchIntoArchiveRecovery() once all the WAL from
> pg_wal directory are applied, this happens right before start
> streaming from primary.

This last sentence is incomplete. This happens right before streaming
or before reading WAL from the archives, where the source of WAL is
switched when the startup process waits for some WAL to become
available (when fetching a new page). If streaming and archives are
set, the order is:
1) local pg_wal
2) archives
3) streaming

That's also documented. The startup process can switch between 2 and
3 depending on the availability of the streaming source.

> Until the startup process hasn't completed the crash recovery i.g. not
> applied the WAL unto 'ControlFile->minRecoveryPoint' logically
> 'bgwriter' can not write any buffer whose WAL location is beyond the
> ControlFile->minRecoveryPoint because we haven't yet applied those
> WALs and also there should not any RestartPoint()/Checkpoint record
> before reaching the ControlFile->minRecoveryPoint otherwise we would
> have started crash recovery from that point.
>
> During the crash recovery process, before the WAL has been fully
> applied up to the ControlFile->minRecoveryPoint, the bgwriter cannot
> write any buffers that have a page LSN beyond this min recovery point.
> This is because those WALs which are beyond min recovery point have
> not yet been applied, so any buffer can not have page_lsn beyond this
> point.

Yep. Doing so is pointless, a newer page version is already on disk,
and we expect crash recovery to replay up to a point where it can
recover.

> And as far as the checkpointer is concerned, on standby systems, we
> only execute RestartPoint when a Checkpoint WAL is detected.

Not exactly. Restart points can be triggered on-the-fly as well
depending on the server activity. It's OK to run a manual CHECKPOINT
on a standby, for example.

> And logically, no Checkpoint WAL should exist between the recovery start
> point and minRecoveryPoint, as crash recovery initiates from the most
> recent checkpoint redo.

We allow restart points even during crash recovery to make restarts
more responsive. If the crash recovery phase it very long, this
matters because we don't to replay a bunch of WAL if the standby is
stopped before crash recovery finishes, still it has enough room to
begin and finish a restart point. See 7ff23c6d277d.
--
Michael

In response to

Re: Incorrect logic in XLogNeedsFlush() at 2025-09-17 13:35:32 from Dilip Kumar

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2025-09-17 23:03:05	Re: New string-truncation warnings from GCC 15
Previous Message	Michael Paquier	2025-09-17 22:36:03	Re: Remove PointerIsValid()