Quick Links

Standby got invalid primary checkpoint after crashed right after promoted.

From:	hao harry <harry-hao(at)outlook(dot)com>
To:	"pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Standby got invalid primary checkpoint after crashed right after promoted.
Date:	2022-03-16 07:16:16
Message-ID:	9EB4CF63-1107-470E-B5A4-061FB9EF8CC8@outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi, pgsql-hackers,

I think I found a case that database is not recoverable, would you please give a look?

Here is how it happens:

- setup primary/standby
- do a lots INSERT at primary
- create a checkpoint at primary
- wait until standby start doing restart point, it take about 3mins syncing buffers to complete
- before the restart point update ControlFile, promote the standby, that changed ControlFile
->state to DB_IN_PRODUCTION, this will skip update to ControlFile, leaving the ControlFile
->checkPoint pointing to a removed file
- before the promoted standby request the post-recovery checkpoint (fast promoted),
one backend crashed, it will kill other server process, so the post-recovery checkpoint skipped
- the database restart startup process, which report: "could not locate a valid checkpoint record"

I attached a test to reproduce it, it does not fail every time, it fails every 10 times to me.
To increase the chance CreateRestartPoint skip update ControlFile and to simulate a crash,
the patch 0001 is needed.

Best Regard.

Harry Hao

Attachment	Content-Type	Size
0001-Patched-CreateRestartPoint-to-reproduce-invalid-chec.patch	application/octet-stream	2.6 KB
reprod_crash_right_after_promoted.pl	text/x-perl-script	2.2 KB

Responses

Re: Standby got invalid primary checkpoint after crashed right after promoted. at 2022-03-16 08:21:46 from hao harry
Re: Standby got invalid primary checkpoint after crashed right after promoted. at 2022-03-16 08:28:45 from Kyotaro Horiguchi

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Paquier	2022-03-16 07:18:09	Re: Tab completion for ALTER MATERIALIZED VIEW ... SET ACCESS METHOD
Previous Message	Masahiko Sawada	2022-03-16 07:07:07	Re: Skipping logical replication transactions on subscriber side