From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
---|---|
To: | harry-hao(at)outlook(dot)com |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Standby got invalid primary checkpoint after crashed right after promoted. |
Date: | 2022-03-16 08:28:45 |
Message-ID: | 20220316.172845.174794657076004563.horikyota.ntt@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
At Wed, 16 Mar 2022 07:16:16 +0000, hao harry <harry-hao(at)outlook(dot)com> wrote in
> Hi, pgsql-hackers,
>
> I think I found a case that database is not recoverable, would you please give a look?
>
> Here is how it happens:
>
> - setup primary/standby
> - do a lots INSERT at primary
> - create a checkpoint at primary
> - wait until standby start doing restart point, it take about 3mins syncing buffers to complete
> - before the restart point update ControlFile, promote the standby, that changed ControlFile
> ->state to DB_IN_PRODUCTION, this will skip update to ControlFile, leaving the ControlFile
> ->checkPoint pointing to a removed file
Yeah, it seems like exactly the same issue pointed in [1]. A fix is
proposed in [1]. Maybe I can remove "possible" from the mail subject:p
[1] https://www.postgresql.org/message-id/7bfad665-db9c-0c2a-2604-9f54763c5f9e%40oss.nttdata.com
[2] https://www.postgresql.org/message-id/20220316.102444.2193181487576617583.horikyota.ntt@gmail.com
> - before the promoted standby request the post-recovery checkpoint (fast promoted),
> one backend crashed, it will kill other server process, so the post-recovery checkpoint skipped
> - the database restart startup process, which report: "could not locate a valid checkpoint record"
>
> I attached a test to reproduce it, it does not fail every time, it fails every 10 times to me.
> To increase the chance CreateRestartPoint skip update ControlFile and to simulate a crash,
> the patch 0001 is needed.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro Horiguchi | 2022-03-16 08:31:44 | Re: Standby got invalid primary checkpoint after crashed right after promoted. |
Previous Message | Masahiko Sawada | 2022-03-16 08:22:35 | Re: Skipping logical replication transactions on subscriber side |