From: | "Imseih (AWS), Sami" <simseih(at)amazon(dot)com> |
---|---|
To: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
Cc: | "alvherre(at)alvh(dot)no-ip(dot)org" <alvherre(at)alvh(dot)no-ip(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [BUG] Panic due to incorrect missingContrecPtr after promotion |
Date: | 2022-02-25 21:28:18 |
Message-ID: | EF51690F-5006-47C4-9473-9DFAFC0B1778@amazon.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> After some investigation, I finally concluded that we should reset
> abortedRecPtr and missingContrecPtr at processing
> XLOG_OVERWRITE_CONTRECORD.
> --- a/src/backend/access/transam/xlogrecovery.c
> +++ b/src/backend/access/transam/xlogrecovery.c
> @@ -1953,6 +1953,11 @@ xlogrecovery_redo(XLogReaderState *record, TimeLineID replayTLI)
LSN_FORMAT_ARGS(xlrec.overwritten_lsn),
timestamptz_to_str(xlrec.overwrite_time))));
> + /* We have safely skipped the aborted record */
> + abortedRecPtr = InvalidXLogRecPtr;
> + missingContrecPtr = InvalidXLogRecPtr;
> +
> /* Verifying the record should only happen once */
> record->overwrittenRecPtr = InvalidXLogRecPtr;
> }
> The last check in the test against "resetting aborted record" is no
> longer useful since it is already checked by
> 026_verwrite_contrecord.pl.
> regards.
+1 for this. Resetting abortedRecPtr and missingContrecPtr after the broken record is skipped in is cleaner. I also think it would make sense to move the " successfully skipped missing contrecord" to after we invalidate the aborted and missing record pointers, like below.
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -1948,15 +1948,15 @@ xlogrecovery_redo(XLogReaderState *record, TimeLineID replayTLI)
LSN_FORMAT_ARGS(xlrec.overwritten_lsn),
LSN_FORMAT_ARGS(record->overwrittenRecPtr));
+ /* We have safely skipped the aborted record */
+ abortedRecPtr = InvalidXLogRecPtr;
+ missingContrecPtr = InvalidXLogRecPtr;
+
ereport(LOG,
(errmsg("successfully skipped missing contrecord at %X/%X, overwritten at %s",
LSN_FORMAT_ARGS(xlrec.overwritten_lsn),
timestamptz_to_str(xlrec.overwrite_time))));
Also, instead of a new test, the current 026_overwrite_contrecord.pl test can include a promotion test at the end.
Attaching a new patch for review.
--
Sami Imseih
Amazon Web Services
Attachment | Content-Type | Size |
---|---|---|
v4-0001-Fix-missing-continuation-record-after-standby-promot.patch | application/octet-stream | 2.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Melanie Plageman | 2022-02-25 21:31:23 | Re: why do hash index builds use smgrextend() for new splitpoint pages |
Previous Message | Andres Freund | 2022-02-25 21:26:49 | Re: [PATCH] Expose port->authn_id to extensions and triggers |