Quick Links

Re: Change checkpoint‑record‑missing PANIC to FATAL

From:	Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Change checkpoint‑record‑missing PANIC to FATAL
Date:	2026-03-20 14:26:46
Message-ID:	CAMm1aWZOrSePJHns2LHdvHbTj1y-40Jgg4um2AwhQGDGotPiyQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> There could be one possibility for 0001 to be unstable, actually. One
> could imagine that by getting the segment from a live server things
> could fail: if the run is very slow then we may finish with an
> incorrect record. It would be possibe to use pg_controldata once the
> server has been shut down, but we should be on the same segment anyway
> because nothing happens on the server, so I have let the test are you
> have suggested, and applied this one. You have missed an update in
> meson.build, and I am not sure that there was a need for renaming 050,
> either.

Good catch—thanks. This does seem like it could apply to the earlier
redo‑record‑missing case as well. That said, since there’s no activity
on the server during the test, it should be safe as you mentioned. My
apologies for missing the meson.build update again—thanks for taking
care of it. On the renaming, I was aiming to separate things into four
tests to make the differences explicit, but I’m happy to stick with
two test files.

> These two are very close to the existing tests that we have on HEAD
> now, could it be better to group them together instead? Particularly,
> 054 reuses the injection point trick in what is clearly a copy-paste
> taken from 050. Avoiding this duplication may be nice.

That's a good point. I agree these new cases are very close to what we
already have on HEAD. I can rework this to avoid duplication by
grouping them with the existing tests.

Best Regards,
Nitin Jadhav
Azure Database for PostgreSQL
Microsoft

On Tue, Mar 10, 2026 at 9:28 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Fri, Mar 06, 2026 at 11:02:41AM +0530, Nitin Jadhav wrote:
> > Patch 0001 adjusts the error severity during crash recovery when the
> > checkpoint record referenced by pg_control cannot be located and no
> > backup_label file is present. The error is lowered from PANIC to
> > FATAL. This patch also adds a new TAP test that verifies startup fails
> > with a clear FATAL error. The test is straightforward: it removes the
> > WAL segment containing the checkpoint record and confirms that the
> > server reports the expected error.
>
> There could be one possibility for 0001 to be unstable, actually. One
> could imagine that by getting the segment from a live server things
> could fail: if the run is very slow then we may finish with an
> incorrect record. It would be possibe to use pg_controldata once the
> server has been shut down, but we should be on the same segment anyway
> because nothing happens on the server, so I have let the test are you
> have suggested, and applied this one. You have missed an update in
> meson.build, and I am not sure that there was a need for renaming 050,
> either.
>
> > Missing checkpoint WAL segment referenced by backup_label:
> > This test uses an online backup to create a backup_label file,
> > extracts the checkpoint record information from it, removes the
> > corresponding WAL segment, and verifies that the server reports the
> > expected error.
> >
> > Missing redo WAL segment referenced by the checkpoint:
> > In this test, redo and checkpoint records are forced into different
> > WAL segments using injection points. A cold backup is then taken, with
> > an explicit backup_label created in the restored cluster. The WAL
> > segment containing the redo record is removed, and startup is expected
> > to fail with the appropriate error message.
>
> These two are very close to the existing tests that we have on HEAD
> now, could it be better to group them together instead? Particularly,
> 054 reuses the injection point trick in what is clearly a copy-paste
> taken from 050. Avoiding this duplication may be nice.
> --
> Michael

In response to

Re: Change checkpoint‑record‑missing PANIC to FATAL at 2026-03-10 03:58:36 from Michael Paquier

Responses

Re: Change checkpoint‑record‑missing PANIC to FATAL at 2026-03-23 00:55:24 from Michael Paquier

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Heikki Linnakangas	2026-03-20 15:14:04	Re: Bug in MultiXact replay compat logic for older minor version after crash-recovery
Previous Message	Aleksander Alekseev	2026-03-20 14:24:35	Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions