Re: Assertion failure in WaitForWALToBecomeAvailable state machine

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Assertion failure in WaitForWALToBecomeAvailable state machine
Date: 2022-09-10 02:22:01
Message-ID: CALj2ACUCt1+BmgP=8Th=y3RVWb3fO9AyXp=NzkDyGhp8Uv6_ZQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 15, 2022 at 11:30 AM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Thu, Aug 11, 2022 at 10:06 PM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > Today I encountered the assertion failure [2] twice while working on
> > another patch [1]. The pattern seems to be that the walreceiver got
> > killed or crashed and set it's state to WALRCV_STOPPING or
> > WALRCV_STOPPED by the team the WAL state machine moves to archive and
> > hence the following XLogShutdownWalRcv() code will not get hit:
> >
> > /*
> > * Before we leave XLOG_FROM_STREAM state, make sure that
> > * walreceiver is not active, so that it won't overwrite
> > * WAL that we restore from archive.
> > */
> > if (WalRcvStreaming())
> > ShutdownWalRcv();
> >
> > I agree with Kyotaro-san to reset InstallXLogFileSegmentActive before
> > entering into the archive mode. Hence I tweaked the code introduced by
> > the following commit a bit, the result v1 patch is attached herewith.
> > Please review it.
>
> I added it to the current commitfest to not lose track of it:
> https://commitfest.postgresql.org/39/3814/.

Today, I spent some more time on this issue, I modified the v1 patch
posted upthread a bit - now resetting the InstallXLogFileSegmentActive
only when the WAL source switched to archive, not every time in
archive mode.

I'm attaching v2 patch here with, please review it further.

Just for the records - there's another report of the assertion failure
at [1], many thanks to Kyotaro-san for providing consistent
reproducible steps.

[1] - https://www.postgresql.org/message-id/flat/20220909.172949.2223165886970819060.horikyota.ntt%40gmail.com

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v2-0001-Avoid-race-condition-in-resetting-XLogCtl-Install.patch application/octet-stream 6.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Lev Kokotov 2022-09-10 02:38:14 Support for Rust
Previous Message Justin Pryzby 2022-09-10 02:06:37 pg15b4: FailedAssertion("TransactionIdIsValid(xmax)