Re: Assertion failure in WaitForWALToBecomeAvailable state machine

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: Assertion failure in WaitForWALToBecomeAvailable state machine
Date: 2022-09-12 07:00:29
Message-ID: Yx7ZDSvjmmPuF5Sd@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Sep 10, 2022 at 07:52:01AM +0530, Bharath Rupireddy wrote:
> Today, I spent some more time on this issue, I modified the v1 patch
> posted upthread a bit - now resetting the InstallXLogFileSegmentActive
> only when the WAL source switched to archive, not every time in
> archive mode.
>
> I'm attaching v2 patch here with, please review it further.
>
> Just for the records - there's another report of the assertion failure
> at [1], many thanks to Kyotaro-san for providing consistent
> reproducible steps.
>
> [1] - https://www.postgresql.org/message-id/flat/20220909.172949.2223165886970819060.horikyota.ntt%40gmail.com

Well, the fact that cc2c7d6 is involved here makes this thread an open
item for PG15 as far as I can see, assigned to Noah (added now in
CC).

While reading your last patch, I have found rather confusing that we
only reset InstallXLogFileSegmentActive when the current source is the
archives and it does not match the old source. This code is already
complicated, and I don't think that having more assumptions in its
internals is a good thing when it comes to its long-term maintenance.
In short, HEAD is rather conservative when it comes to set
InstallXLogFileSegmentActive, switching it only when we request
streaming with RequestXLogStreaming(), but too aggressive when it
comes to reset it and we want something in the middle ground. FWIW, I
find better the approach taken by Horiguchi-san in [1] to reset the
state before we attempt to read WAL from the archives *or* pg_wal,
after we know that the last source has failed, where we know that we
are not streaming yet (but recovery may be requested soon).

Side note: I don't see much the point of having two routines named
SetInstallXLogFileSegmentActive and ResetInstallXLogFileSegmentActive
that do the opposite thing. We could just have one.

[1]: https://www.postgresql.org/message-id/20220214.171428.735280610520122187.horikyota.ntt@gmail.com
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-09-12 07:33:39 Re: pg_upgrade generated files in subdir follow-up
Previous Message Peter Eisentraut 2022-09-12 06:53:53 Re: Expand palloc/pg_malloc API