Re: [BUG] non archived WAL removed during production crash recovery

From: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: masao(dot)fujii(at)oss(dot)nttdata(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org, michael(at)paquier(dot)xyz, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [BUG] non archived WAL removed during production crash recovery
Date: 2020-04-14 16:09:23
Message-ID: 20200414180923.759f4d9c@firost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Fri, 10 Apr 2020 11:00:31 +0900 (JST)
Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
[...]
> > > but the mistake here is it thinks that inRecovery represents whether it is
> > > running as a standby or not, but actually it is true on primary during
> > > crash recovery.
> >
> > Indeed.
> >
> > > On the other hand, with the patch, standby with archive_mode=on
> > > wrongly archives WAL files during crash recovery.
> >
> > "without the patch" you mean? You are talking about 78ea8b5daab, right?
>
> No. I menat the v4 patch in [1].
>
> [1] https://www.postgresql.org/message-id/20200407171736.61906608%40firost
>
> Prior to appllying the patch (that is the commit 78ea..),
> XLogArchiveCheckDone() correctly returns true (= allow to remove it)
> for the same condition.
>
> The proposed patch does the folloing thing.
>
> if (!XLogArchivingActive() ||
> recoveryState == ARCHIVE_RECOVERING && !XLogArchivingAlways())
> return true;
>
> It doesn't return for the condition "recoveryState=CRASH_RECOVERING
> and archive_mode = on". Then the WAL files is mitakenly marked as
> ".ready" if not marked yet.

Indeed.

But .ready files are then deleted during the first restartpoint. I'm not sure
how to fix this behavior without making the code too complex.

This is discussed in my last answer to Michael few minutes ago as well.

> By the way, the code seems not following the convention a bit
> here. Let the inserting code be in the same style to the existing code
> around.
>
> + if ( ! XLogArchivingActive() )
>
> I think we don't put the spaces within the parentheses above.

Indeed, This is fixed in patch v5 sent few minutes ago.

> | ARCHIVE_RECOVERING/CRASH_RECOVERING/NOT_IN_RECOVERY
>
> The first two and the last one are in different style. *I* prefer them
> (if we use it) be "IN_ARCHIVE_RECOVERY/IN_CRASH_RECOVERY/NOT_IN_RECOVERY".

I like Michael's proposal. See v5 of the patch.

Thank you for your review!

Regards,

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2020-04-14 17:12:47 BUG #16362: yum repo: duplicated definition
Previous Message Jehan-Guillaume de Rorthais 2020-04-14 16:03:19 Re: [BUG] non archived WAL removed during production crash recovery

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Joseph Krogh 2020-04-14 16:49:11 ERROR: could not determine which collation to use for string comparison
Previous Message Jehan-Guillaume de Rorthais 2020-04-14 16:03:19 Re: [BUG] non archived WAL removed during production crash recovery