Re: [BUG] non archived WAL removed during production crash recovery

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, masao(dot)fujii(at)oss(dot)nttdata(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: [BUG] non archived WAL removed during production crash recovery
Date: 2020-04-16 08:11:00
Message-ID: 20200416081100.GD81957@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Tue, Apr 14, 2020 at 06:03:19PM +0200, Jehan-Guillaume de Rorthais wrote:

Thanks for the new version.

> On Mon, 13 Apr 2020 16:14:14 +0900 Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>> It seems to me that the initial value of SharedRecoveryState should be
>> CRASH_RECOVERING all the time no? StartupXLOG() is a code path taken
>> even if starting cleanly, and the flag would be reset correctly at the
>> end to NOT_IN_RECOVERY.
>
> Previous version of the patch was setting CRASH_RECOVERING. Fujii-san reported
> (the 4 Apr 2020 02:49:50 +0900) that it could be useful to expose a better value
> until relevant code is reached in StartupXLOG() so GetRecoveryState() returns
> a safer value for futur use.
>
> As I answered upthread, I'm not sure who would need this information before the
> WAL machinery is up though. Note that ARCHIVE_RECOVERING and CRASH_RECOVERING
> are compatible with the previous behavior.

I am not sure either if we need this information until the startup
process is up, but even if we need it I'd rather keep the code
consistent with the past practice, which was that
SharedRecoveryInProgress got set to true, the equivalent of crash
recovery as that's more generic than the archive recovery switch.

> Maybe the solution would be to init with CRASH_RECOVERING and add some comment
> in GetRecoveryState() to warn the state is "enforced" after the XLOG machinery
> is started and is init'ed to RECOVERING in the meantime?
>
> I initialized it to CRASH_RECOVERING in the new v5 patch and added a comment
> to GetRecoveryState().

Not sure that the comment is worth it. Your description of the state
looks enough, and the code is clear that we have just an initial
shared memory state in this case.

>> + /* The file is always deletable if archive_mode is "off". */
>> + if ( ! XLogArchivingActive() )
>> + return true;
>> [...]
>> + if ( recoveryState == ARCHIVE_RECOVERING && !XLogArchivingAlways() )
>> return true;
>> Incorrect indentation.
>
> Is it the spaces as reported by Horiguchi-san? I removed them in latest patch.

Yes.

>> It would be good to mention that while in crash recovery, files are
>> not considered as deletable.
>
> Well, in fact, I am still wondering about this. I was hesitant to add a
> shortcut like:
>
> /* no WAL segment cleanup during crash recovery */
> if (recoveryState == RECOVERT_STATE_CRASH)
> return false;
>
> But, what if for example we crashed for lack of disk space during intensive
> writes? During crash recovery, any WAL marked as .done could be removed and
> allow the system to start again and maybe make even further WAL cleanup by
> archiving some more WAL without competing with previous high write ratio.
>
> When we recover a primary, this behavior seems conform with any value of
> archive_mode, even if it has been changed after crash and before starting it.
> On a standby, we might create .ready files, but they will be removed during the
> first restartpoint if needed.

I guess that you mean .ready and not .done here? Removing .done files
does not matter as the segments related to them are already gone.
Even with that, why should we need to make the backend smarter about
the removal of .ready files during crash recovery. It seems to me
that we should keep them, and an operator could always come by himself
and do some manual cleanup to free some space in the pg_wal partition.

> I needed a failure so I can test pg_stat_archiver reports it as well. In my
> mind, using "false" would either trigger a failure because false returns 1 or...
> a failure because the command is not found. In either way, the result is the
> same.
>
> Using poll_query_until+pg_stat_file, is a good idea, but not enough as
> archiver reports a failure some moment after the .ready signal file appears.

Using an empty string makes the test more portable, but while I looked
at it I have found out that it leads to delays in the archiver except
if you force the generation of more segments in the test, causing the
logic to get more complicated with the manipulation of the .ready and
.done files. And I was then finding myself to add an
archive_timeout.. Anyway, this reduced the readability of the test so
I am pretty much giving up on this idea.

>> The end of the tests actually relies on the
>> fact that archive_command is set to "false" when the cold backup is
>> taken, before resetting it. I think that it would be cleaner to
>> enforce the configuration you want to test before starting each
>> standby. It becomes easier to understand the flow of the test for the
>> reader.
>
> done as well.

I have put my hands on the code, and attached is a cleaned up
version for the backend part. Below are some notes.

+ * RECOVERT_STATE_ARCHIVE is set for archive recovery or for a
standby.
Typo here that actually comes from my previous email, and that you
blindly copy-pasted, repeated five times in the tree actually.

+ RecoveryState recoveryState = GetRecoveryState();
+
+ /* The file is always deletable if archive_mode is "off". */
+ if (!XLogArchivingActive())
+ return true;
There is no point to call GetRecoveryState() is archive_mode = off.

+ * There's two different reasons to recover WAL: when standby mode is
requested
+ * or after a crash to catchup with consistency.
Nit: s/There's/There are/. Anyway, I don't see much point in keeping
this comment as the description of each state value should be enough,
so I have removed it.

I am currently in the middle of reviewing the test and there are a
couple of things that can be improved but I lack of time today, so
I'll continue tomorrow on it. There is no need to send two separate
patches by the way as the code paths touched are different.
--
Michael

Attachment Content-Type Size
recovery-archives-v6.patch text/x-diff 12.8 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-04-16 09:10:49 Re: BUG #16368: Incorrect function inlining in the presence of a window function
Previous Message Michael Paquier 2020-04-16 05:30:36 Re: BUG #16369: Segmentation Faults and Data Corruption with Generated Columns

Browse pgsql-hackers by date

  From Date Subject
Next Message Rajkumar Raghuwanshi 2020-04-16 08:26:47 ERROR: could not open file "pg_tblspc/ issue with replication setup.
Previous Message Masahiko Sawada 2020-04-16 07:48:28 Re: Race condition in SyncRepGetSyncStandbysPriority