|From:||Michael Paquier <michael(at)paquier(dot)xyz>|
|To:||Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>|
|Cc:||Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>|
|Subject:||Re: PANIC during crash recovery of a recently promoted standby|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
On Mon, May 14, 2018 at 01:14:22PM +0530, Pavan Deolasee wrote:
> Looks like I didn't understand Alvaro's comment when he mentioned it to me
> off-list. But I now see what Michael and Alvaro mean and that indeed seems
> like a problem. I was thinking that the test for (ControlFile->state ==
> DB_IN_ARCHIVE_RECOVERY) will ensure that minRecoveryPoint can't be updated
> after the standby is promoted. While that's true for a DB_IN_PRODUCTION, the
> RestartPoint may finish after we have written end-of-recovery record, but
> before we're in production and thus the minRecoveryPoint may again be set.
Yeah, this has been something I considered as well first, but I was not
confident enough that setting up minRecoveryPoint to InvalidXLogRecPtr
was actually a safe thing for timeline switches.
So I have spent a good portion of today testing and playing with it to
be confident enough that this was right, and I have finished with the
attached. The patch adds a new flag to XLogCtl which marks if the
control file has been updated after the end-of-recovery record has been
written, so as minRecoveryPoint does not get updated because of a
restart point running in parallel.
I have also reworked the test case you sent, removing the manuals sleeps
and replacing them with correct wait points. There is also no point to
wait after promotion as pg_ctl promote implies a wait. Another
important thing is that you need to use wal_log_hints = off to see a
crash, which is something that allows_streaming actually enables.
Comments are welcome.
|Next Message||Maxim Boguk||2018-05-24 09:38:03||Re: found xmin from before relfrozenxid on pg_catalog.pg_authid|
|Previous Message||Thomas Munro||2018-05-24 07:15:23||Re: PG11 jit failing on ppc64el|