Re: Assertion failure in WaitForWALToBecomeAvailable state machine

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: bharath(dot)rupireddyforpostgres(at)gmail(dot)com
Cc: dilipbalaut(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Assertion failure in WaitForWALToBecomeAvailable state machine
Date: 2022-02-14 08:14:28
Message-ID: 20220214.171428.735280610520122187.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Fri, 11 Feb 2022 22:25:49 +0530, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote in
> > I don't think
> > > just making InstallXLogFileSegmentActive false is enough. By looking
> > > at the comment [1], it doesn't make sense to move ahead for restoring
> > > from the archive location without the WAL receiver fully stopped.
> > > IMO, the real fix is to just remove WalRcvStreaming() and call
> > > XLogShutdownWalRcv() unconditionally. Anyways, we have the
> > > Assert(!WalRcvStreaming()); down below. I don't think it will create
> > > any problem.
> >
> > If WalRcvStreaming() is returning false that means walreceiver is
> > already stopped so we don't need to shutdown it externally. I think
> > like we are setting this flag outside start streaming we can reset it
> > also outside XLogShutdownWalRcv. Or I am fine even if we call
> > XLogShutdownWalRcv() because if walreceiver is stopped it will just
> > reset the flag we want it to reset and it will do nothing else.
>
> As I said, I'm okay with just calling XLogShutdownWalRcv()
> unconditionally as it just returns in case walreceiver has already
> stopped and updates the InstallXLogFileSegmentActive flag to false.
>
> Let's also hear what other hackers have to say about this.

Firstly, good catch:) And the direction seems right.

It seems like an overlook of cc2c7d65fc. We cannot install new wal
segments only while we're in archive recovery. Conversely, we must
turn off it when entering archive recovery (not exiting streaming
recovery). So, *I* feel like to do that at the beginning of
XLOG_FROM_ARCHIVE/PG_WAL rather than the end of XLOG_FROM_STREAM.

(And I would like to remove XLogShutDownWalRcv() and turn off the flag
in StartupXLOG explicitly, but it would be overdone.)

--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -12800,6 +12800,16 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
*/
Assert(!WalRcvStreaming());

+ /*
+ * WAL segment installation conflicts with archive
+ * recovery. Make sure it is turned off. XLogShutdownWalRcv()
+ * does that but it is not done when the process has voluntary
+ * exited for example for replication timeout.
+ */
+ LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+ XLogCtl->InstallXLogFileSegmentActive = false;
+ LWLockRelease(ControlFileLock);
+
/* Close any old file we might have open. */
if (readFile >= 0)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2022-02-14 08:55:19 Re: Database-level collation version tracking
Previous Message Michael Paquier 2022-02-14 08:01:05 Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set