Assertion failure in WaitForWALToBecomeAvailable state machine

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Assertion failure in WaitForWALToBecomeAvailable state machine
Date: 2022-02-11 10:02:45
Message-ID: CAFiTN-sE3ry=ycMPVtC+Djw4Fd7gbUGVv_qqw6qfzp=JLvqT3g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

The problem is that whenever we are going for streaming we always set
XLogCtl->InstallXLogFileSegmentActive to true, but while switching
from streaming to archive we do not always reset it so it hits
assertion in some cases. Basically we reset it inside
XLogShutdownWalRcv() but while switching from the streaming mode we
only call it conditionally if WalRcvStreaming(). But it is very much
possible that even before we call WalRcvStreaming() the walreceiver
might have set alrcv->walRcvState to WALRCV_STOPPED. So now
WalRcvStreaming() will return false. So I agree now we do not want to
really shut down the walreceiver but who will reset the flag?

I just ran some tests on primary and attached the walreceiver to gdb
and waited for it to exit with timeout and then the recovery process
hit the assertion.

2022-02-11 14:33:56.976 IST [60978] FATAL: terminating walreceiver
due to timeout
cp: cannot stat
‘/home/dilipkumar/work/PG/install/bin/wal_archive/00000002.history’:
No such file or directory
2022-02-11 14:33:57.002 IST [60973] LOG: restored log file
"000000010000000000000003" from archive
TRAP: FailedAssertion("!XLogCtl->InstallXLogFileSegmentActive", File:
"xlog.c", Line: 3823, PID: 60973)

I have just applied a quick fix and that solved the issue, basically
if the last failed source was streaming and the WalRcvStreaming() is
false then just reset this flag.

@@ -12717,6 +12717,12 @@ WaitForWALToBecomeAvailable(XLogRecPtr
RecPtr, bool randAccess,
*/
if (WalRcvStreaming())
XLogShutdownWalRcv();
+ else
+ {
+
LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
XLogCtl->InstallXLogFileSegmentActive = false;
+ LWLockRelease(ControlFileLock);
+ }

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2022-02-11 10:37:17 Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
Previous Message Nikolay Shaplov 2022-02-11 09:51:22 [PATCH] minor reloption regression tests improvement