Re: [BUG] non archived WAL removed during production crash recovery

From: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org, michael(at)paquier(dot)xyz
Subject: Re: [BUG] non archived WAL removed during production crash recovery
Date: 2020-04-07 15:17:36
Message-ID: 20200407171736.61906608@firost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Sat, 4 Apr 2020 02:49:50 +0900
Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:

> On 2020/04/04 1:26, Jehan-Guillaume de Rorthais wrote:
> > On Thu, 2 Apr 2020 23:55:46 +0900
> > Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
> >
> >> On 2020/04/02 22:49, Jehan-Guillaume de Rorthais wrote:
> >>> On Thu, 2 Apr 2020 19:38:59 +0900
> >>> Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
> > [...]
> >>>> That is, WAL files with .ready files are removed when either
> >>>> archive_mode!=always in standby mode or archive_mode=off.
> >>>
> >>> sounds fine to me.
> >
> > To some extends, it appears to me this sentence was relatively close to my
> > previous patch, as far as you exclude IN_CRASH_RECOVERY from the shortcut:
> >
> > XLogArchiveCheckDone(const char *xlog)
> > {
> > [...]
> > if ( (inRecoveryState != IN_CRASH_RECOVERY) && (
> > (inRecoveryState == NOT_IN_RECOVERY && !XLogArchivingActive()) ||
> > (inRecoveryState == IN_ARCHIVE_RECOVERY
> > && !XLogArchivingAlways()))) return true;
> >
> > Which means that only .done cleanup would occurs during CRASH_RECOVERY
> > and .ready files might be created if no .done exists. No matter the futur
> > status of the cluster: primary or standby. Normal shortcut will apply
> > during first checkpoint after the crash recovery step.
> >
> > This should handle the case where a backup without backup_label (taken from
> > a snapshot or after a shutdown with immediate) is restored to build a
> > standby.
> >
> > Please, find in attachment a third version of my patch
> > "0001-v3-Fix-WAL-retention-during-production-crash-recovery.patch".
>
> Thanks for updating the patch! Here are my review comments:
>
> - bool SharedRecoveryInProgress;
> + RecoveryState SharedRecoveryInProgress;
>
> Since the patch changes the meaning of this variable, the name of
> the variable should be changed? Otherwise, the current name seems
> confusing.

Indeed, fixed using SharedRecoveryState

> + SpinLockAcquire(&XLogCtl->info_lck);
> + XLogCtl->SharedRecoveryInProgress =
> IN_CRASH_RECOVERY;
> + SpinLockRelease(&XLogCtl->info_lck);
>
> As I explained upthread, this code can be reached and IN_CRASH_RECOVERY
> can be set even in standby or archive recovery. Is this right behavior that
> you're expecting?

Yes. This patch avoids archive cleanup during crash recovery altogether,
whatever the requested status for the cluster.

> Even in crash recovery case, GetRecoveryState() returns IN_ARCHIVE_RECOVERY
> until this code is reached.

I tried to stick as close as possible with "ControlFile->state" and old
XLogCtl->SharedRecoveryInProgress variables. That's why it's initialized as
IN_ARCHIVE_RECOVERY as XLogCtl->SharedRecoveryInProgress was init as true as
well.

The status itself is set during StartupXLOG when the historical code actually
tries to define and record the real state between DB_IN_ARCHIVE_RECOVERY and
DB_IN_CRASH_RECOVERY.

> Also when WAL replay is not necessary (e.g., restart of the server shutdowed
> cleanly before), GetRecoveryState() returns IN_ARCHIVE_RECOVERY because this
> code is not reached.

It is set to NOT_IN_RECOVERY at the end of StartupXLOG, in the same place we
set ControlFile->state = DB_IN_PRODUCTION. So GetRecoveryState() returns
NOT_IN_RECOVERY as soon as StartupXLOG is done when no WAL replay is necessary.

> Aren't these fragile? If XLogArchiveCheckDone() is only user of
> GetRecoveryState(), they would be ok. But if another user will appear in the
> future, it seems very easy to mistake. At least those behaviors should be
> commented in GetRecoveryState().

We certainly can set SharedRecoveryState earlier, in XLOGShmemInit, based on
the ControlFile->state value. In my understanding, anything different than
DB_SHUTDOWNED or DB_SHUTDOWNED_IN_RECOVERY can be considered as a crash
recovery. Based on this XLOGShmemInit can init SharedRecoveryState to
IN_ARCHIVE_RECOVERY or IN_CRASH_RECOVERY.

With either values, RecoveryInProgress() keep returning the same result so any
current code relying on RecoveryInProgress() is compatible.

I'm not sure who would need this information before the WAL machinery is up,
but is it safe enough in your opinion for futur usage of GetRecoveryState()? Do
you think this information might be useful before the WAL machinery is set?
Current "user" (eg. restoreTwoPhaseData()) only need to know if we are in
recovery, whatever the reason.

The patch in attachment set SharedRecoveryState to either IN_ARCHIVE_RECOVERY
or IN_CRASH_RECOVERY from XLOGShmemInit based on the ControlFile state. It
feels strange though to set this so far away from ControlFile->state
= DB_IN_CRASH_RECOVERY...

> - if (!((XLogArchivingActive() && !inRecovery) ||
> - (XLogArchivingAlways() && inRecovery)))
> + if ( (inRecoveryState != IN_CRASH_RECOVERY) && (
> + (inRecoveryState == NOT_IN_RECOVERY
> && !XLogArchivingActive()) &&
> + (inRecoveryState == IN_ARCHIVE_RECOVERY
> && !XLogArchivingAlways()))) return true;
>
> The last condition seems to cause XLogArchiveCheckDone() to return
> true in archive recovery mode with archive_mode=on, then cause
> unarchived WAL files with .ready to be removed. Is my understanding right?
> If yes, that behavior doesn't seem to match with our consensus, i.e.,
> WAL files with .ready should not be removed in that case.

We wrote:

>> That is, WAL files with .ready files are removed when either
>> archive_mode!=always in standby mode or archive_mode=off.
>
> sounds fine to me.

So if in standby mode and archive_mode is not "always", the .ready files
should be removed.

In current patch, I split the conditions in the sake of clarity.

> +/* Recovery state */
> +typedef enum RecoveryState
> +{
> + NOT_IN_RECOVERY = 0,
> + IN_CRASH_RECOVERY,
> + IN_ARCHIVE_RECOVERY
> +} RecoveryState;
>
> Isn't it better to add more comments here? For example, what does
> "Recovery state" mean? Which state is used in standby mode? Why? ,etc.

Explanations added.

> Is it really ok not to have the value indicating standby mode?

Unless I'm wrong, this shared state only indicates why we are recovering WAL,
it does not need reflect the status of the instance in shared memory.
StandbyMode is already available for the startup process. Would it be useful
to share this mode outside of the startup process?

> These enum values names are confusing because the variables with
> similar names already exist. For example, IN_CRASH_RECOVERY vs.
> DB_IN_CRASH_RECOVERY. So IMO it seems better to rename them,
> .e.g., by adding the prefix.

Well, I lack of imagination. So I picked CRASH_RECOVERING and
ARCHIVE_RECOVERING.

> > "0002-v1-Add-test-on-non-archived-WAL-during-crash-recovery.patch" is left
> > untouched. But I'm considering adding some more tests relative to this
> > discussion.
> >
> >> BTW, now I'm thinking that the flag in shmem should be updated when
> >> the startup process sets StandbyModeRequested to true at the beginning
> >> of the recovery. That is,
> >>
> >> - Add something like SharedStandbyModeRequested into XLogCtl. This field
> >> should be initialized with false;
> >> - Set XLogCtl->SharedStandbyModeRequested to true when the startup
> >> process detects the standby.signal file and sets the local variable
> >> StandbyModeRequested to true.
> >> - Make XLogArchiveCheckDone() use XLogCtl->SharedStandbyModeRequested
> >> to know whether the server is in standby mode or not.
> >>
> >> Thought?
> >
> > I try to avoid a new flag in memory with the proposal in attachment of this
> > email. It seems to me various combinaisons of booleans with subtle
> > differences around the same subject makes it a bit trappy and complicated
> > to understand.
>
> Ok, so firstly I try to review your patch!

Thank you for your review and help!

In attachment:
* fix proposal: 0001-v4-Fix-WAL-retention-during-production-crash-recovery.patch
* add test: 0002-v2-Add-test-on-non-archived-WAL-during-crash-recovery.patch
0002-v2 fix conflict with current master.

Regards,

Attachment Content-Type Size
0001-v4-Fix-WAL-retention-during-production-crash-recovery.patch text/x-patch 7.0 KB
0002-v2-Add-test-on-non-archived-WAL-during-crash-recovery.patch text/x-patch 1.5 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2020-04-07 15:22:28 Re: BUG #16348: Memory leak when parsing config
Previous Message Masahiko Sawada 2020-04-07 14:35:16 Re: [bug] Wrong bool value parameter

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-04-07 15:34:15 Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Previous Message Tomas Vondra 2020-04-07 15:15:21 Re: [HACKERS] advanced partition matching algorithm for partition-wise join