Re: Change checkpoint‑record‑missing PANIC to FATAL

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Change checkpoint‑record‑missing PANIC to FATAL
Date: 2026-03-10 03:58:36
Message-ID: aa-W7D-1ONpwIxwO@paquier.xyz
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 06, 2026 at 11:02:41AM +0530, Nitin Jadhav wrote:
> Patch 0001 adjusts the error severity during crash recovery when the
> checkpoint record referenced by pg_control cannot be located and no
> backup_label file is present. The error is lowered from PANIC to
> FATAL. This patch also adds a new TAP test that verifies startup fails
> with a clear FATAL error. The test is straightforward: it removes the
> WAL segment containing the checkpoint record and confirms that the
> server reports the expected error.

There could be one possibility for 0001 to be unstable, actually. One
could imagine that by getting the segment from a live server things
could fail: if the run is very slow then we may finish with an
incorrect record. It would be possibe to use pg_controldata once the
server has been shut down, but we should be on the same segment anyway
because nothing happens on the server, so I have let the test are you
have suggested, and applied this one. You have missed an update in
meson.build, and I am not sure that there was a need for renaming 050,
either.

> Missing checkpoint WAL segment referenced by backup_label:
> This test uses an online backup to create a backup_label file,
> extracts the checkpoint record information from it, removes the
> corresponding WAL segment, and verifies that the server reports the
> expected error.
>
> Missing redo WAL segment referenced by the checkpoint:
> In this test, redo and checkpoint records are forced into different
> WAL segments using injection points. A cold backup is then taken, with
> an explicit backup_label created in the restored cluster. The WAL
> segment containing the redo record is removed, and startup is expected
> to fail with the appropriate error message.

These two are very close to the existing tests that we have on HEAD
now, could it be better to group them together instead? Particularly,
054 reuses the injection point trick in what is clearly a copy-paste
taken from 050. Avoiding this duplication may be nice.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2026-03-10 04:54:48 Re: Skipping schema changes in publication
Previous Message Xuneng Zhou 2026-03-10 03:55:30 Re: Refactor recovery conflict signaling a little