From: | hao harry <harry-hao(at)outlook(dot)com> |
---|---|
To: | "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Standby got invalid primary checkpoint after crashed right after promoted. |
Date: | 2022-03-16 08:21:46 |
Message-ID: | C8AD8B0B-7914-4D4A-96C3-B8CF724C51C2@outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Found this issue is duplicated to [1], after applied that patch, I cannot reproduce it anymore.
[1] https://www.postgresql.org/message-id/flat/20220316.102444.2193181487576617583.horikyota.ntt%40gmail.com<https://www.postgresql.org/message-id/flat/20220316(dot)102444(dot)2193181487576617583(dot)horikyota(dot)ntt(at)gmail(dot)com>
2022年3月16日 下午3:16,hao harry <harry-hao(at)outlook(dot)com<mailto:harry-hao(at)outlook(dot)com>> 写道:
Hi, pgsql-hackers,
I think I found a case that database is not recoverable, would you please give a look?
Here is how it happens:
- setup primary/standby
- do a lots INSERT at primary
- create a checkpoint at primary
- wait until standby start doing restart point, it take about 3mins syncing buffers to complete
- before the restart point update ControlFile, promote the standby, that changed ControlFile
->state to DB_IN_PRODUCTION, this will skip update to ControlFile, leaving the ControlFile
->checkPoint pointing to a removed file
- before the promoted standby request the post-recovery checkpoint (fast promoted),
one backend crashed, it will kill other server process, so the post-recovery checkpoint skipped
- the database restart startup process, which report: "could not locate a valid checkpoint record"
I attached a test to reproduce it, it does not fail every time, it fails every 10 times to me.
To increase the chance CreateRestartPoint skip update ControlFile and to simulate a crash,
the patch 0001 is needed.
Best Regard.
Harry Hao
<0001-Patched-CreateRestartPoint-to-reproduce-invalid-chec.patch><reprod_crash_right_after_promoted.pl>
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2022-03-16 08:22:35 | Re: Skipping logical replication transactions on subscriber side |
Previous Message | Kyotaro Horiguchi | 2022-03-16 08:15:58 | Re: pg_tablespace_location() failure with allow_in_place_tablespaces |