Re: [PATCH] Fix minRecoveryPoint not advanced past checkpoint in CreateRestartPoint

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Adam Lee <adam8157(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Michael Paquier <michael(at)paquier(dot)xyz>
Subject: Re: [PATCH] Fix minRecoveryPoint not advanced past checkpoint in CreateRestartPoint
Date: 2026-04-01 09:38:15
Message-ID: 038d97bc-fbe4-4d99-b7a5-e468ef361123@iki.fi
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01/04/2026 11:53, Adam Lee wrote:
> Hi hackers,
>
> I ran into this while working on recovery pre-check logic that relies on
> pg_controldata to verify whether replay has reached a specific restore point.
>
> Reproducer:
>
> ```
> -- on primary:
> CHECKPOINT;
> SELECT pg_create_restore_point('test_rp');
>
> -- recover with:
> -- recovery_target_name = 'test_rp'
> -- recovery_target_action = 'shutdown'
>
> -- after recovery shuts down:
> pg_controldata shows minRecoveryPoint 104 bytes behind
> pg_create_restore_point's return value (104 bytes = one
> RESTORE_POINT WAL record).
> ```
>
> My RCA:
>
> When recovery_target_action=shutdown triggers, the checkpointer performs a
> shutdown restartpoint via CreateRestartPoint(). If a CHECKPOINT record was
> replayed shortly before the recovery target, CreateRestartPoint advances
> minRecoveryPoint to the end of that CHECKPOINT record.
>
> However, any no-op records replayed after the CHECKPOINT (such as
> RESTORE_POINT) do not dirty pages, so the lazy minRecoveryPoint update that
> normally happens during page flushes never fires for them. As a result,
> minRecoveryPoint in pg_control ends up behind the actual replay position.

Hmm, what exactly does minRecoveryPoint mean? The current behavior is
correct in the sense that if you restarted recovery, you could still
stop the recovery at the earlier LSN that's the minRecoveryPoint in the
control file, and the system would be consistent. I agree it feels
pretty weird though, it would seem natural to advance minRecoveryPoint
to the last replayed record on a restartpoint.

> My Fix:
>
> The attached patch fixes this by reading the current replay position from
> shared memory after advancing minRecoveryPoint to the checkpoint end, and
> advancing further if replay has progressed past it. This is safe because
> CheckPointGuts() has already flushed all dirty buffers and the startup process
> has exited, so replayEndRecPtr is stable and all pages are on disk.

We have this comment earlier in CreateRestartPoint():

> * We don't explicitly advance minRecoveryPoint when we do create a
> * restartpoint. It's assumed that flushing the buffers will do that as a
> * side-effect.

That assumption is not quite right, then.

Perhaps we should simply call UpdateMinRecoveryPoint()? That would cause
the control file to be flushed twice though, so it's a little
inefficient, but maybe that's fine.

If we go with your patch, does it make this existing logic below obsolete?

> if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY)
> {
> if (ControlFile->minRecoveryPoint < lastCheckPointEndPtr)
> {
> ControlFile->minRecoveryPoint = lastCheckPointEndPtr;
> ControlFile->minRecoveryPointTLI = lastCheckPoint.ThisTimeLineID;
>
> /* update local copy */
> LocalMinRecoveryPoint = ControlFile->minRecoveryPoint;
> LocalMinRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
> }
> if (flags & CHECKPOINT_IS_SHUTDOWN)
> ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;
> }

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2026-04-01 09:39:31 Re: Adding REPACK [concurrently]
Previous Message Zsolt Parragi 2026-04-01 09:35:18 Re: [oauth] Split and extend PGOAUTHDEBUG