Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: simseih(at)amazon(dot)com
Cc: alvherre(at)alvh(dot)no-ip(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUG] Panic due to incorrect missingContrecPtr after promotion
Date: 2022-02-24 08:27:03
Message-ID: 20220224.172703.816674226135379648.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Thu, 24 Feb 2022 16:26:42 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> So, actually WAL did not ended in an incomplete record. I think
> FinishWalRecover is the last place to do that. (But it could be
> earlier.)

After some investigation, I finally concluded that we should reset
abortedRecPtr and missingContrecPtr at processing
XLOG_OVERWRITE_CONTRECORD.

--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -1953,6 +1953,11 @@ xlogrecovery_redo(XLogReaderState *record, TimeLineID replayTLI)
LSN_FORMAT_ARGS(xlrec.overwritten_lsn),
timestamptz_to_str(xlrec.overwrite_time))));

+ /* We have safely skipped the aborted record */
+ abortedRecPtr = InvalidXLogRecPtr;
+ missingContrecPtr = InvalidXLogRecPtr;
+
/* Verifying the record should only happen once */
record->overwrittenRecPtr = InvalidXLogRecPtr;
}

The last check in the test against "resetting aborted record" is no
longer useful since it is already checked by
026_verwrite_contrecord.pl.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2022-02-24 08:53:33 Re: Design of pg_stat_subscription_workers vs pgstats
Previous Message kuroda.hayato@fujitsu.com 2022-02-24 08:06:29 RE: Logical replication timeout problem