Re: Assertion failure in WaitForWALToBecomeAvailable state machine

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Assertion failure in WaitForWALToBecomeAvailable state machine
Date: 2022-02-11 13:00:43
Message-ID: CAFiTN-uSihjSGCbES+zHfVtF2ugGE8EBejD1_tAcGh=zyY65XQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 11, 2022 at 6:22 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Fri, Feb 11, 2022 at 3:33 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >

> IIUC, the issue can happen while the walreceiver failed to get WAL
> from primary for whatever reasons and its status is not
> WALRCV_STOPPING or WALRCV_STOPPED, and the startup process moved ahead
> in WaitForWALToBecomeAvailable for reading from archive which ends up
> in this assertion failure. ITSM, a rare scenario and it depends on
> what walreceiver does between failure to get WAL from primary and
> updating status to WALRCV_STOPPING or WALRCV_STOPPED.
>
> If the above race condition is a serious problem, if one thinks at
> least it is a problem at all, that needs to be fixed.

I don't think we can design a software which has open race conditions
even though they are rarely occurring :)

I don't think
> just making InstallXLogFileSegmentActive false is enough. By looking
> at the comment [1], it doesn't make sense to move ahead for restoring
> from the archive location without the WAL receiver fully stopped.
> IMO, the real fix is to just remove WalRcvStreaming() and call
> XLogShutdownWalRcv() unconditionally. Anyways, we have the
> Assert(!WalRcvStreaming()); down below. I don't think it will create
> any problem.

If WalRcvStreaming() is returning false that means walreceiver is
already stopped so we don't need to shutdown it externally. I think
like we are setting this flag outside start streaming we can reset it
also outside XLogShutdownWalRcv. Or I am fine even if we call
XLogShutdownWalRcv() because if walreceiver is stopped it will just
reset the flag we want it to reset and it will do nothing else.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Sharma 2022-02-11 13:43:58 Re: Identify missing publications from publisher while create/alter subscription.
Previous Message Etsuro Fujita 2022-02-11 12:59:18 Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit