Re: standby promotion can create unreadable WAL

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: standby promotion can create unreadable WAL
Date: 2022-08-24 05:39:44
Message-ID: CAFiTN-t7umki=PK8dT1tcPV=mOUe2vNhHML6b3T7W7qqvvajjg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 23, 2022 at 12:06 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> Nothing that uses xlogreader is going to be able to bridge the gap
> between file #4 and file #5. In this case it doesn't matter very much,
> because we immediately write a checkpoint record into file #5, so if
> we crash we won't try to replay file #4 anyway. However, if anything
> did try to look at file #4 it would get confused. Maybe that can
> happen if this is a streaming standby, where we only write an
> end-of-recovery record upon promotion, rather than a checkpoint, or
> maybe if there are cascading standbys someone could try to actually
> use the 000000020000000000000004 file for something. I'm not sure. But
> unless I'm missing something, that file is bogus, and our only hope of
> not having problems is that perhaps no one will ever look at it.

Yeah, this analysis looks correct to me.

> I think that the cause of this problem is this code right here:
>
> /*
> * Actually, if WAL ended in an incomplete record, skip the parts that
> * made it through and start writing after the portion that persisted.
> * (It's critical to first write an OVERWRITE_CONTRECORD message, which
> * we'll do as soon as we're open for writing new WAL.)
> */
> if (!XLogRecPtrIsInvalid(missingContrecPtr))
> {
> Assert(!XLogRecPtrIsInvalid(abortedRecPtr));
> EndOfLog = missingContrecPtr;
> }

Yeah, this statement as well as another statement that creates the
overwrite contrecord. After changing these two lines the problem is
fixed for me. Although I haven't yet thought of all the scenarios
that whether it is safe in all the cases. I agree that after timeline
changes we are pointing to the end of the last valid record we can
start writing the next record from that point onward. But I think we
should need to think hard that whether it will break any case for
which the overwrite contrecord was actually introduced.

diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index 7602fc8..3d38613 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5491,7 +5491,7 @@ StartupXLOG(void)
* (It's critical to first write an OVERWRITE_CONTRECORD message, which
* we'll do as soon as we're open for writing new WAL.)
*/
- if (!XLogRecPtrIsInvalid(missingContrecPtr))
+ if (newTLI == endOfRecoveryInfo->lastRecTLI &&
!XLogRecPtrIsInvalid(missingContrecPtr))
{
Assert(!XLogRecPtrIsInvalid(abortedRecPtr));
EndOfLog = missingContrecPtr;
@@ -5589,7 +5589,7 @@ StartupXLOG(void)
LocalSetXLogInsertAllowed();

/* If necessary, write overwrite-contrecord before doing
anything else */
- if (!XLogRecPtrIsInvalid(abortedRecPtr))
+ if (newTLI == endOfRecoveryInfo->lastRecTLI &&
!XLogRecPtrIsInvalid(abortedRecPtr))
{
Assert(!XLogRecPtrIsInvalid(missingContrecPtr));
CreateOverwriteContrecordRecord(abortedRecPtr,
missingContrecPtr, newTLI);

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2022-08-24 06:37:09 Re: Schema variables - new implementation for Postgres 15
Previous Message John Naylor 2022-08-24 04:59:25 Re: [PATCH] Optimize json_lex_string by batching character copying