Re: [BUG] non archived WAL removed during production crash recovery

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: jgdr(at)dalibo(dot)com, masao(dot)fujii(at)oss(dot)nttdata(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: [BUG] non archived WAL removed during production crash recovery
Date: 2020-04-13 07:14:14
Message-ID: 20200413071414.GF2169@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Fri, Apr 10, 2020 at 11:14:54AM +0900, Kyotaro Horiguchi wrote:
> I have the same feeling with Michael. The test that archives are
> created correctly seems to fit the file. It would be unintentionally
> that the file is not exercising archiving so much.

I have been finally looking at this thread and the latest patch set,
sorry for the late reply.

XLogCtl->XLogCacheBlck = XLOGbuffers - 1;
- XLogCtl->SharedRecoveryInProgress = true;
XLogCtl->SharedHotStandbyActive = false;
XLogCtl->SharedPromoteIsTriggered = false;
XLogCtl->WalWriterSleeping = false;
[...]
+ switch (ControlFile->state)
+ {
+ case DB_SHUTDOWNED:
+ case DB_SHUTDOWNED_IN_RECOVERY:
+ XLogCtl->SharedRecoveryState = ARCHIVE_RECOVERING;
+ break;
+ default:
+ XLogCtl->SharedRecoveryState = CRASH_RECOVERING;
+ }
It seems to me that the initial value of SharedRecoveryState should be
CRASH_RECOVERING all the time no? StartupXLOG() is a code path taken
even if starting cleanly, and the flag would be reset correctly at the
end to NOT_IN_RECOVERY.

+typedef enum RecoveryState
+{
+ NOT_IN_RECOVERY = 0, /* currently in production */
+ CRASH_RECOVERING, /* recovering from a crash */
+ ARCHIVE_RECOVERING /* recovering archives as requested */
+} RecoveryState;
I also have some issues with the name of those variables, here is an
idea for the three states:
- RECOVERY_STATE_CRASH
- RECOVERT_STATE_ARCHIVE
- RECOVERY_STATE_NONE
I would recommend to use the same suffix for those variables to ease
grepping.

/*
- * Local copy of SharedRecoveryInProgress variable. True actually means "not
- * known, need to check the shared state".
+ * This is false when SharedRecoveryState is not ARCHIVE_RECOVERING.
+ * True actually means "not known, need to check the shared state".
*/
A double negation sounds wrong to me. And actually this variable is
false when the shared state is set to NOT_IN_RECOVERY, and true when
the state is either CRASH_RECOVERING or ARCHIVE_RECOVERING because it
means that recovery is running, be it archive recovery or crash
recovery, so the comment is wrong.

+ /* The file is always deletable if archive_mode is "off". */
+ if ( ! XLogArchivingActive() )
+ return true;
[...]
+ if ( recoveryState == ARCHIVE_RECOVERING && !XLogArchivingAlways() )
return true;
Incorrect indentation.

UpdateControlFile();
+
+ SpinLockAcquire(&XLogCtl->info_lck);
+ XLogCtl->SharedRecoveryState = ARCHIVE_RECOVERING;
+ SpinLockRelease(&XLogCtl->info_lck);
+
LWLockRelease(ControlFileLock);
Here, the shared flag is updated while holding ControlFileLock to
ensure a consistent state of things within shared memory, so it would
be nice to add a comment about that.

+RecoveryState
+GetRecoveryState(void)
+{
+ volatile XLogCtlData *xlogctl = XLogCtl;
+
+ return xlogctl->SharedRecoveryState;
+}
Er, you need to acquire info_lck to look at the state here, no?

/*
- * The file is always deletable if archive_mode is "off". On standbys
- * archiving is disabled if archive_mode is "on", and enabled with
- * "always". On a primary, archiving is enabled if archive_mode is "on"
- * or "always".
+ * On standbys, the file is deletable if archive_mode is not
+ * "always".
*/
It would be good to mention that while in crash recovery, files are
not considered as deletable.

I agree that having a separate .pl file for the tests of this thread
is just cleaner. Here are more comments about these.

+# temporary fail archive_command for futur tests
+$node->safe_psql('postgres', q{
+ ALTER SYSTEM SET archive_command TO 'false';
+ SELECT pg_reload_conf();
+});
That's likely portable on Windows even if you skip the tests there,
and I am not sure that it is a good idea to rely on it being in PATH.
Depending on the system, the path of the command is also likely going
to be different. As here the goal is to prevent the archiver to do
its work, why not relying on the configuration where archive_mode is
set but archive_command is not? This would cause the archiver to be a
no-op process, and .ready files will remain around. You could then
replace the lookup of pg_stat_archiver with poll_query_until() and a
query that makes use of pg_stat_file() to make sure that the .ready
exists when needed.

+ ok( -f "$node_data/pg_wal/archive_status/000000010000000000000001.ready",
+ "WAL still ready to archive in archive_status");
It would be good to mention in the description the check applies to a
primary.

+# test recovery without archive_mode=always does not keep .ready WALs
+$standby1 = get_new_node('standby');
+$standby1->init_from_backup($node, 'backup', has_restoring => 1);
+$standby1_data = $standby1->data_dir;
+$standby1->start;
+$standby1->safe_psql('postgres', q{CHECKPOINT});
For readability archive_mode = on should be added to the configuration
file? Okay, this is inherited from the primary, still that would
avoid any issues with this code is refactored in some way.

"WAL waiting to be archived in backup removed with archive_mode=on
on standby" );
That should be "WAL segment" or "WAL file", but not WAL.

Regarding the tests on a standby, it seems to me that the following
is necessary:
1) Test that with archive_mode = on, segments are not marked with
.ready.
2) Test that with archive_mode = always, segments are marked with
.ready during archive recovery.
3) Test that with archive_mode = always, segments are not removed
during crash recovery.
I can see tests for 1) and 2), but not 3). Could you add a
stop('immediate')+start() for $standby2 at the end of
020_archive_status.pl and check that the .ready file is still there
after crash recovery? The end of the tests actually relies on the
fact that archive_command is set to "false" when the cold backup is
taken, before resetting it. I think that it would be cleaner to
enforce the configuration you want to test before starting each
standby. It becomes easier to understand the flow of the test for the
reader.
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Kamil Jońca 2020-04-13 07:25:35 backend crash
Previous Message Artur Zakirov 2020-04-12 14:13:26 Re: BUG #16337: Finnish Ispell dictionary cannot be created

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2020-04-13 07:40:14 Re: WAL usage calculation patch
Previous Message Kyotaro Horiguchi 2020-04-13 06:45:56 Re: pg_validatebackup -> pg_verifybackup?