Re: Fix primary crash continually with invalid checkpoint after promote

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Zhao Rui <875941708(at)qq(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, masao(dot)fujii(at)oss(dot)nttdata(dot)com, Nathan Bossart <nathandbossart(at)gmail(dot)com>
Subject: Re: Fix primary crash continually with invalid checkpoint after promote
Date: 2022-04-26 19:47:13
Message-ID: 258772.1651002433@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

"=?ISO-8859-1?B?WmhhbyBSdWk=?=" <875941708(at)qq(dot)com> writes:
> Newly promoted primary may leave an invalid checkpoint.
> In function CreateRestartPoint, control file is updated and old wals are removed. But in some situations, control file is not updated, old wals are still removed. Thus produces an invalid checkpoint with nonexistent wal. Crucial log: "invalid primary checkpoint record", "could not locate a valid checkpoint record".

I believe this is the same issue being discussed here:

https://www.postgresql.org/message-id/flat/20220316.102444.2193181487576617583.horikyota.ntt%40gmail.com

but Horiguchi-san's proposed fix looks quite different from yours.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Loïc Revest 2022-04-26 22:25:36 lag() default value ignored for some window partition depending on table records count?
Previous Message Nathan Bossart 2022-04-26 18:16:29 Re: Fix primary crash continually with invalid checkpoint after promote

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2022-04-26 20:09:35 Re: avoid multiple hard links to same WAL file after a crash
Previous Message Nathan Bossart 2022-04-26 18:33:49 Re: Possible corruption by CreateRestartPoint at promotion