Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: simseih(at)amazon(dot)com
Cc: michael(at)paquier(dot)xyz, alvherre(at)alvh(dot)no-ip(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUG] Panic due to incorrect missingContrecPtr after promotion
Date: 2022-08-08 04:06:54
Message-ID: 20220808.130654.541433441863454305.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Fri, 5 Aug 2022 21:28:16 +0000, "Imseih (AWS), Sami" <simseih(at)amazon(dot)com> wrote in
> > Would you mind trying the second attached to abtain detailed log on
> > your testing environment? With the patch, the modified TAP test yields
> > the log lines like below.
>
> I applied the logging patch to 13.7 ( attached is the backport ) and repro'd the
> Issue.
>
> I stripped out the relevant parts of the file. Let me know if this is
> helpful.

Thank you very much!

> postgresql.log.2022-08-05-17:2022-08-05 17:18:51 UTC::@:[359]:LOG: ### [F] @0/10000000: abort=(0/0)0/0, miss=(0/0)0/0, SbyMode=0, SbyModeReq=1
> postgresql.log.2022-08-05-17:2022-08-05 17:22:21 UTC::@:[359]:LOG: ### [S] @0/10000060: abort=(0/0)0/0, miss=(0/0)0/0, SbyMode=1, SbyModeReq=1

The server seem to have started as a standby after crashing a
primary. Is it correct?

> postgresql.log.2022-08-05-18:2022-08-05 18:38:14 UTC::@:[359]:LOG: ### [F] @6/B6CB27D0: abort=(0/0)0/0, miss=(0/0)0/0, SbyMode=1, SbyModeReq=1
> postgresql.log.2022-08-05-18:2022-08-05 18:38:14 UTC::@:[359]:LOG: ### [S] @6/B6CB27D0: abort=(0/0)0/0, miss=(0/0)0/0, SbyMode=0, SbyModeReq=1

Archive recovery ended here. The server should have promoted that
time.. Do you see some interesting log lines around this time?

> postgresql.log.2022-08-05-18:2022-08-05 18:50:13 UTC::@:[359]:LOG: ### [S] @6/B8000198: abort=(0/0)0/0, miss=(0/0)0/0, SbyMode=0, SbyModeReq=1

But, recovery continues in non-standby mode. I don't see how come it
behaves that way.

> postgresql.log.2022-08-05-18:2022-08-05 18:50:20 UTC::@:[359]:LOG: ### [A] @6/F3FFFF20: abort=(6/F3FFFF20)0/0, miss=(6/F4000000)0/0, SbyMode=0, SbyModeReq=1
> postgresql.log.2022-08-05-18:2022-08-05 18:50:20 UTC::@:[359]:LOG: ### [S] @6/F4000030: abort=(0/0)6/F3FFFF20, miss=(0/0)6/F4000000, SbyMode=1, SbyModeReq=1

Then archive recovery starts again.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2022-08-08 04:29:09 Re: Generalize ereport_startup_progress infrastructure
Previous Message Amit Kapila 2022-08-08 04:04:47 Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns