Re: [BUG] non archived WAL removed during production crash recovery

From: Andres Freund <andres(at)anarazel(dot)de>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, masao(dot)fujii(at)oss(dot)nttdata(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: [BUG] non archived WAL removed during production crash recovery
Date: 2020-04-24 01:48:56
Message-ID: 20200424014856.f3jvvcxxmt5p7k5z@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Hi,

On 2020-04-24 08:54:02 +0900, Michael Paquier wrote:
> On Thu, Apr 23, 2020 at 06:59:53PM +0200, Jehan-Guillaume de Rorthais wrote:
> > Please, find in v11 for version 9.5, 9.6 and 10.
>
> I have worked more on that using your v11, tweaked few comments in the
> new test parts for 9.5~10, and applied the whole. Thanks all for your
> work. I am keeping now an eye on the buildfarm.

I am confused by this commit. You added shared memory to differentiate
between crash recovery and standby mode/archive recovery, correct? You
write:

> This commit fixes the regression by tracking in shared memory if a live
> cluster is either in crash recovery or archive recovery as the handling
> of WAL segments ready to be archived is different in both cases (those
> WAL segments should not be removed during crash recovery), and by using
> this new shared memory state to decide if a segment can be recycled or
> not.

But don't we pretty much know this already from the state of the system?
During crash recovery there's nothing running RemoveOldXLogFiles() but
the startup process. Right? And in DB_IN_ARCHIVE_RECOVERY it should only
happen as part of restartpoints (i.e. the checkpointer).

Did you add the new shared state to avoid deducing things from the
"environment"? If so, it should really be mentioned in the commit
message & code. Because:

> Previously, it was not possible to know if a cluster was in crash
> recovery or archive recovery as the shared state was able to track only
> if recovery was happening or not, leading to the problem.

really doesn't make that obvious.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2020-04-24 02:21:15 Re: [BUG] non archived WAL removed during production crash recovery
Previous Message M Fysh 2020-04-24 00:19:27 Re: BUG #16364: ICACLS error when installing under system context "NT AUTHORITY\SYSTEM" ie installing with SCCM

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-04-24 02:21:15 Re: [BUG] non archived WAL removed during production crash recovery
Previous Message Peter Geoghegan 2020-04-24 00:44:08 Re: xid wraparound danger due to INDEX_CLEANUP false