Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: "Imseih (AWS), Sami" <simseih(at)amazon(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUG] Panic due to incorrect missingContrecPtr after promotion
Date: 2022-02-22 20:16:41
Message-ID: 202202222016.3nar64wc7xs7@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2022-Feb-22, Imseih (AWS), Sami wrote:

> On 13.5 a wal flush PANIC is encountered after a standby is promoted.
>
> With debugging, it was found that when a standby skips a missing
> continuation record on recovery, the missingContrecPtr is not
> invalidated after the record is skipped. Therefore, when the standby
> is promoted to a primary it writes an overwrite_contrecord with an LSN
> of the missingContrecPtr, which is now in the past. On flush time,
> this causes a PANIC. From what I can see, this failure scenario can
> only occur after a standby is promoted.

Ooh, nice find and diagnosys. I can confirm that the test fails as you
described without the code fix, and doesn't fail with it.

I attach the same patch, with the test file put in its final place
rather than as a patch. Due to recent xlog.c changes this need a bit of
work to apply to back branches; I'll see about getting it in all
branches soon.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"I'm impressed how quickly you are fixing this obscure issue. I came from
MS SQL and it would be hard for me to put into words how much of a better job
you all are doing on [PostgreSQL]."
Steve Midgley, http://archives.postgresql.org/pgsql-sql/2008-08/msg00000.php

Attachment Content-Type Size
v2-0001-Fix-missing-continuation-record-after-standby-pro.patch text/x-diff 5.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-02-22 20:54:55 Re: bailing out in tap tests nearly always a bad idea
Previous Message Tomas Vondra 2022-02-22 20:12:15 Re: postgres_fdw: using TABLESAMPLE to collect remote sample