Re: [BUG] non archived WAL removed during production crash recovery

From: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, masao(dot)fujii(at)oss(dot)nttdata(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: [BUG] non archived WAL removed during production crash recovery
Date: 2020-04-16 22:07:39
Message-ID: 20200416234308.3e5f64bf@firost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Thu, 16 Apr 2020 17:11:00 +0900
Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> On Tue, Apr 14, 2020 at 06:03:19PM +0200, Jehan-Guillaume de Rorthais wrote:
>
> Thanks for the new version.

Thank you for v6.

> > On Mon, 13 Apr 2020 16:14:14 +0900 Michael Paquier <michael(at)paquier(dot)xyz>
> > wrote:
> >> It seems to me that the initial value of SharedRecoveryState should be
> >> CRASH_RECOVERING all the time no? StartupXLOG() is a code path taken
> >> even if starting cleanly, and the flag would be reset correctly at the
> >> end to NOT_IN_RECOVERY.
> >
> > Previous version of the patch was setting CRASH_RECOVERING. Fujii-san
> > reported (the 4 Apr 2020 02:49:50 +0900) that it could be useful to expose
> > a better value until relevant code is reached in StartupXLOG() so
> > GetRecoveryState() returns a safer value for futur use.
> >
> > As I answered upthread, I'm not sure who would need this information before
> > the WAL machinery is up though. Note that ARCHIVE_RECOVERING and
> > CRASH_RECOVERING are compatible with the previous behavior.
>
> I am not sure either if we need this information until the startup
> process is up, but even if we need it I'd rather keep the code
> consistent with the past practice, which was that
> SharedRecoveryInProgress got set to true, the equivalent of crash
> recovery as that's more generic than the archive recovery switch.

OK.

> > Maybe the solution would be to init with CRASH_RECOVERING and add some
> > comment in GetRecoveryState() to warn the state is "enforced" after the
> > XLOG machinery is started and is init'ed to RECOVERING in the meantime?
> >
> > I initialized it to CRASH_RECOVERING in the new v5 patch and added a comment
> > to GetRecoveryState().
>
> Not sure that the comment is worth it. Your description of the state
> looks enough, and the code is clear that we have just an initial
> shared memory state in this case.

OK. Will remove later after your review of the tests.

> >> It would be good to mention that while in crash recovery, files are
> >> not considered as deletable.
> >
> > Well, in fact, I am still wondering about this. I was hesitant to add a
> > shortcut like:
> >
> > /* no WAL segment cleanup during crash recovery */
> > if (recoveryState == RECOVERT_STATE_CRASH)
> > return false;
> >
> > But, what if for example we crashed for lack of disk space during intensive
> > writes? During crash recovery, any WAL marked as .done could be removed and
> > allow the system to start again and maybe make even further WAL cleanup by
> > archiving some more WAL without competing with previous high write ratio.
> >
> > When we recover a primary, this behavior seems conform with any value of
> > archive_mode, even if it has been changed after crash and before starting
> > it. On a standby, we might create .ready files, but they will be removed
> > during the first restartpoint if needed.
>
> I guess that you mean .ready and not .done here?

No, I meant .done.

> Removing .done files does not matter as the segments related to them are
> already gone.

Not necessarily. There's a time windows between the moment the archiver set
the .done file and when the checkpointer removes the associated WAL file.
So, after a PANIC because of lack of space, WAL associated with .done files but
not removed yet will be removed during the crash recovery.

> Even with that, why should we need to make the backend smarter about
> the removal of .ready files during crash recovery. It seems to me
> that we should keep them, and an operator could always come by himself
> and do some manual cleanup to free some space in the pg_wal partition.

We are agree on this.

> > I needed a failure so I can test pg_stat_archiver reports it as well. In my
> > mind, using "false" would either trigger a failure because false returns 1
> > or... a failure because the command is not found. In either way, the result
> > is the same.
> >
> > Using poll_query_until+pg_stat_file, is a good idea, but not enough as
> > archiver reports a failure some moment after the .ready signal file
> > appears.
>
> Using an empty string makes the test more portable, but while I looked
> at it I have found out that it leads to delays in the archiver except
> if you force the generation of more segments in the test, causing the
> logic to get more complicated with the manipulation of the .ready and
> .done files. And I was then finding myself to add an
> archive_timeout.. Anyway, this reduced the readability of the test so
> I am pretty much giving up on this idea.

Unless I'm wrong, the empty string does not raise an error in pg_stat_archiver,
and I wanted to add a test on this as well.

> >> The end of the tests actually relies on the
> >> fact that archive_command is set to "false" when the cold backup is
> >> taken, before resetting it. I think that it would be cleaner to
> >> enforce the configuration you want to test before starting each
> >> standby. It becomes easier to understand the flow of the test for the
> >> reader.
> >
> > done as well.
>
> I have put my hands on the code, and attached is a cleaned up
> version for the backend part. Below are some notes.
>
> + * RECOVERT_STATE_ARCHIVE is set for archive recovery or for a
> standby.
> Typo here that actually comes from my previous email, and that you
> blindly copy-pasted, repeated five times in the tree actually.

Oops...yes, even in a comment with RECOVERT_STATE_NONE :/
Sorry.

> + RecoveryState recoveryState = GetRecoveryState();
> +
> + /* The file is always deletable if archive_mode is "off". */
> + if (!XLogArchivingActive())
> + return true;
> There is no point to call GetRecoveryState() is archive_mode = off.

good catch!

> + * There's two different reasons to recover WAL: when standby mode is
> requested
> + * or after a crash to catchup with consistency.
> Nit: s/There's/There are/. Anyway, I don't see much point in keeping
> this comment as the description of each state value should be enough,
> so I have removed it.

OK

> I am currently in the middle of reviewing the test and there are a
> couple of things that can be improved but I lack of time today, so
> I'll continue tomorrow on it. There is no need to send two separate
> patches by the way as the code paths touched are different.

OK.

Thanks for your review! Let me know if you want me to add/change/fix some tests.

Regards,

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2020-04-17 05:39:21 BUG #16370: pgadmin broken
Previous Message Tomas Vondra 2020-04-16 21:36:56 Re: BUG #16363: Memory increases for each table accessed until connection is closed

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-04-16 22:12:07 Re: [DOC] Document concurrent index builds waiting on each other
Previous Message Andreas Karlsson 2020-04-16 22:05:33 Re: Poll: are people okay with function/operator table redesign?