From: | vignesh ravichandran <admin(at)viggy28(dot)dev> |
---|---|
To: | "James Coleman" <jtc331(at)gmail(dot)com> |
Cc: | "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: pg_rewind: warn when checkpoint hasn't happened after promotion |
Date: | 2022-06-07 14:41:11 |
Message-ID: | 1813e9cca21.c475f47b793218.4858147397643612577@viggy28.dev |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I think this is a good improvement and also like the option (on pg_rewind) to potentially send checkpoints to the source.
Personal anecdote. I was using stolon and frequently failing over. For sometime the rewind was failing that it wasn't required. Only learnt that it's the checkpoint on the source which was missing.
References https://github.com/sorintlab/stolon/issues/601
And the fix https://github.com/sorintlab/stolon/pull/644
https://github.com/sorintlab/stolon/issues/601
---- On Sat, 04 Jun 2022 05:59:12 -0700 James Coleman <mailto:jtc331(at)gmail(dot)com> wrote ----
A few weeks back I sent a bug report [1] directly to the -bugs mailing
list, and I haven't seen any activity on it (maybe this is because I
emailed directly instead of using the form?), but I got some time to
take a look and concluded that a first-level fix is pretty simple.
A quick background refresher: after promoting a standby rewinding the
former primary requires that a checkpoint have been completed on the
new primary after promotion. This is correctly documented. However
pg_rewind incorrectly reports to the user that a rewind isn't
necessary because the source and target are on the same timeline.
Specifically, this happens when the control file on the newly promoted
server looks like:
Latest checkpoint's TimeLineID: 4
Latest checkpoint's PrevTimeLineID: 4
...
Min recovery ending loc's timeline: 5
Attached is a patch that detects this condition and reports it as an
error to the user.
In the spirit of the new-ish "ensure shutdown" functionality I could
imagine extending this to automatically issue a checkpoint when this
situation is detected. I haven't started to code that up, however,
wanting to first get buy-in on that.
Thanks,
James Coleman
1: https://www.postgresql.org/message-id/CAAaqYe8b2DBbooTprY4v=BiZEd9qBqVLq+FD9j617eQFjk1KvQ@mail.gmail.com
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2022-06-07 14:51:35 | Re: JSON_TABLE output collations |
Previous Message | Robert Haas | 2022-06-07 14:26:03 | Re: How about a psql backslash command to show GUCs? |