Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: "Imseih (AWS), Sami" <simseih(at)amazon(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "alvherre(at)alvh(dot)no-ip(dot)org" <alvherre(at)alvh(dot)no-ip(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUG] Panic due to incorrect missingContrecPtr after promotion
Date: 2022-06-20 07:13:43
Message-ID: YrAeJw+DjSFkrgG1@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 27, 2022 at 07:01:37PM +0000, Imseih (AWS), Sami wrote:
> What we found:
>
> 1. missingContrecPtr is set when
> StandbyMode is false, therefore
> only a writer should set this value
> and a record is then sent downstream.
>
> But a standby going through crash
> recovery will always have StandbyMode = false,
> causing the missingContrecPtr to be incorrectly
> set.

That stands as true as far as I know, StandbyMode would be switched
only once we get out of crash recovery, and only if archive recovery
completes when there is a restore_command.

> 2. If StandbyModeRequested is checked instead,
> we ensure that a standby will not set a
> missingContrecPtr.
>
> 3. After applying the patch below, the tap test succeeded

Hmm. I have not looked at that in depth, but if the intention is to
check that the database is able to write WAL, looking at
XLogCtl->SharedRecoveryState would be the way to go because that's the
flip switching between crash recovery, archive recovery and the end of
recovery (when WAL can be safely written).

The check in xlogrecovery_redo() still looks like a good thing to have
anyway, because we know that we can safely skip the contrecord. Now,
for any patch produced, could the existing TAP test be extended so as
we are able to get a PANIC even if we keep around the sanity check in
xlogrecovery_redo(). That would likely involve an immediate shutdown
of a standby followed by a start sequence?
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Przemysław Sztoch 2022-06-20 08:37:57 Re: [PATCH] Completed unaccent dictionary with many missing characters
Previous Message shiy.fnst@fujitsu.com 2022-06-20 06:46:47 RE: Replica Identity check of partition table on subscriber