Re: standby promotion can create unreadable WAL

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: dilipbalaut(at)gmail(dot)com
Cc: robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org, alvherre(at)alvh(dot)no-ip(dot)org, andres(at)anarazel(dot)de
Subject: Re: standby promotion can create unreadable WAL
Date: 2022-08-24 08:40:07
Message-ID: 20220824.174007.2207040351678986759.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Nice find!

At Wed, 24 Aug 2022 11:09:44 +0530, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote in
> On Tue, Aug 23, 2022 at 12:06 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> > Nothing that uses xlogreader is going to be able to bridge the gap
> > between file #4 and file #5. In this case it doesn't matter very much,
> > because we immediately write a checkpoint record into file #5, so if
> > we crash we won't try to replay file #4 anyway. However, if anything
> > did try to look at file #4 it would get confused. Maybe that can
> > happen if this is a streaming standby, where we only write an
> > end-of-recovery record upon promotion, rather than a checkpoint, or
> > maybe if there are cascading standbys someone could try to actually
> > use the 000000020000000000000004 file for something. I'm not sure. But
> > unless I'm missing something, that file is bogus, and our only hope of
> > not having problems is that perhaps no one will ever look at it.
>
> Yeah, this analysis looks correct to me.

(I didn't reproduce the case but understand what is happening.)

Me, too. There are two ways to deal with this, I think. One is start
writing new records from abortedContRecPtr as if it were not
exist. Another is copying WAL file up to missingContRecPtr. Since the
first segment of the new timeline doesn't need to be identcal to the
last one of the previous timeline, so I think the former way is
cleaner. XLogInitNewTimeline or near seems to be be the place for fix
to me. Clearing abortedRecPtr and missingContrecPtr just before the
call to findNewestTimeLine will work?

====
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 87b243e0d4..27e01153e7 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5396,6 +5396,13 @@ StartupXLOG(void)
*/
XLogInitNewTimeline(EndOfLogTLI, EndOfLog, newTLI);

+ /*
+ * EndOfLog doesn't cover aborted contrecord even if the last record
+ * was that, then the next timeline starts writing from there. Forget
+ * about aborted and missing contrecords even if any.
+ */
+ abortedRecPtr = missingContrecPtr = InvalidXLogRecPtr;
+
/*
* Remove the signal files out of the way, so that we don't
* accidentally re-enter archive recovery mode in a subsequent crash.
====

> > I think that the cause of this problem is this code right here:
> >
> > /*
> > * Actually, if WAL ended in an incomplete record, skip the parts that
> > * made it through and start writing after the portion that persisted.
> > * (It's critical to first write an OVERWRITE_CONTRECORD message, which
> > * we'll do as soon as we're open for writing new WAL.)
> > */
> > if (!XLogRecPtrIsInvalid(missingContrecPtr))
> > {
> > Assert(!XLogRecPtrIsInvalid(abortedRecPtr));
> > EndOfLog = missingContrecPtr;
> > }
>
> Yeah, this statement as well as another statement that creates the
> overwrite contrecord. After changing these two lines the problem is
> fixed for me. Although I haven't yet thought of all the scenarios
> that whether it is safe in all the cases. I agree that after timeline
> changes we are pointing to the end of the last valid record we can
> start writing the next record from that point onward. But I think we
> should need to think hard that whether it will break any case for
> which the overwrite contrecord was actually introduced.
>
> diff --git a/src/backend/access/transam/xlog.c
> b/src/backend/access/transam/xlog.c
> index 7602fc8..3d38613 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -5491,7 +5491,7 @@ StartupXLOG(void)
> * (It's critical to first write an OVERWRITE_CONTRECORD message, which
> * we'll do as soon as we're open for writing new WAL.)
> */
> - if (!XLogRecPtrIsInvalid(missingContrecPtr))
> + if (newTLI == endOfRecoveryInfo->lastRecTLI &&
> !XLogRecPtrIsInvalid(missingContrecPtr))
> {
> Assert(!XLogRecPtrIsInvalid(abortedRecPtr));
> EndOfLog = missingContrecPtr;
> @@ -5589,7 +5589,7 @@ StartupXLOG(void)
> LocalSetXLogInsertAllowed();
>
> /* If necessary, write overwrite-contrecord before doing
> anything else */
> - if (!XLogRecPtrIsInvalid(abortedRecPtr))
> + if (newTLI == endOfRecoveryInfo->lastRecTLI &&
> !XLogRecPtrIsInvalid(abortedRecPtr))
> {
> Assert(!XLogRecPtrIsInvalid(missingContrecPtr));
> CreateOverwriteContrecordRecord(abortedRecPtr,
> missingContrecPtr, newTLI);

This also seems to work, of course.

# However, I haven't managed to reproduce that, yet...

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2022-08-24 08:59:23 Re: ICU for global collation
Previous Message talk to ben 2022-08-24 08:05:55 Re: archive modules