Re: [BUG] non archived WAL removed during production crash recovery

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: michael(at)paquier(dot)xyz
Cc: jgdr(at)dalibo(dot)com, masao(dot)fujii(at)oss(dot)nttdata(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: [BUG] non archived WAL removed during production crash recovery
Date: 2020-04-20 07:02:31
Message-ID: 20200420.160231.719176619682386260.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

At Sat, 18 Apr 2020 18:26:11 +0900, Michael Paquier <michael(at)paquier(dot)xyz> wrote in
> On Fri, Apr 17, 2020 at 03:33:04PM +0200, Jehan-Guillaume de Rorthais wrote:
> > On Fri, 17 Apr 2020 15:50:43 +0900 Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> >> Not sure that it is something that matters for this thread though, so
> >> if necessary I think that it could be discussed separately.
> >
> > OK. However, unless I'm wrong, what I am describing as a desired behavior
> > is the current behavior of XLogArchiveCheckDone. So, we might want to decide if
> > v8 should return false during crash recovery no matter the archive_mode setup,
> > or if we keep the current behavior. I vote for keeping it this way.
>
> I would rather avoid that now, as we don't check explicitely for crash
> recovery in this code path. And for the purpose of this patch it is
> fine to stick with the extra check on a standby with
> (RECOVERY_STATE_ARCHIVE && archive_mode = always).

The commit 78ea8b5daa intends that WAL segments are properly removed
on standby with archive_mode=on by not marking .ready. The v7
actually let such segments be marked .ready, but they are finally
removed after entering archive recovery. It preserves the patch's
intention in that perspective. (I'd rather prefer to distinguish
"ArchiveRecoveryRequested" somehow but it would be more complex and it
is not the agreement on this thread.)

As the result, +1 to what v7 is doing and discussing on earlier
removal of such WAL segments separately if needed.

> > Maybe we could use something more common for all plateform? Eg.:
> >
> > archive_command='this command does not exist'
> >
> > At least, we would have the same error everywhere, as far as it could matter...
>
> Yeah. We could try to do with "false" as command anyway, and see what
> the buildfarm thinks. As the test is skipped on Windows, I would
> assume that it does not matter much anyway. Let's see what others
> think about this piece. I don't have plans to touch again this patch
> until likely the middle of next week.

Couldn't we use "/" as a globally-results-in-failure command? But
that doesn't increment failed_count. The reason is pgarch_archiveXLog
exits with FATAL for "is a directory" error. The comment asserts that
we exit with FATAL for SIGINT or SIGQUIT and if so it is enough to
check only exit-by-signal case. The following fix worked.

diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 01ffd6513c..def6a68063 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -595,7 +595,7 @@ pgarch_archiveXlog(char *xlog)
* "command not found" type of error. If we overreact it's no big
* deal, the postmaster will just start the archiver again.
*/
- int lev = wait_result_is_any_signal(rc, true) ? FATAL : LOG;
+ int lev = wait_result_is_any_signal(rc, false) ? FATAL : LOG;

if (WIFEXITED(rc))
{

I didn't tested it on Windows (I somehow broke my repo and it's too
slow to clone.) but system("/") returned 1 and I think that result
increments the counter.

> >> Thanks, I have worked more on the test, refactoring pieces related to
> >> the segment names, adjusting some comments and fixing some of the
> >> logic. Note that you introduced something incorrect at the creation
> >> of $standby2 as you have been updating postgresql.conf.auto for
> >> $standby1.
> >
> > erf, last minute quick edit with lack of review on my side :(
>
> No problem. It happens.
>
> >> I have noticed an extra issue while looking at the backend pieces
> >> today: at the beginning of the REDO loop we forgot one place where
> >> SharedRecoveryState *has* to be updated to a correct state (around
> >> the comment "Update pg_control to show that we are..." in xlog.c) as
> >> the startup process may decide to switch the control file state to
> >> DB_IN_ARCHIVE_RECOVERY or DB_IN_CRASH_RECOVERY, but we forgot to
> >> update the new shared flag at this early stage. It did not matter
> >> before because SharedRecoveryInProgress would be only "true" for both,
> >> but that's not the case anymore as we need to make the difference
> >> between crash recovery and archive recovery in the new flag. There is
> >> no actual need to update SharedRecoveryState to RECOVERY_STATE_CRASH
> >> as the initial shared memory state is RECOVERY_STATE_CRASH, but
> >> updating the flag makes the code more consistent IMCRASHO so I updated it
> >> anyway in the attached.
> >
> > Grmbl...I had this logic the other way around: init with
> > RECOVERY_STATE_RECOVERY and set to CRASH in this exact if/then/else block... I
> > removed it in v4 when setting XLogCtl->SharedRecoveryState to RECOVERY or CRASH
> > based on ControlFile->state.
> >
> > Sorry, I forgot it after discussing the init value in v5 :(
>
> Indeed. The extra initialization was part of v4, and got removed as
> of v5. Still, it seems to me that this part was not complete without
> updating the shared memory field correctly at the beginning of the
> REDO processing as the last version of the patch does.

I may not be following the discussion, but I think it is reasonable
that SharedRecoveryState is initialized as CRASH then moves to ARCHIVE
as needed and finished by NONE. That transition also stables
RecoveryInProgress().

Other minor comments:

+ RECOVERY_STATE_NONE /* currently in production */

I think it would be better be RECOVERY_STATE_DONE.

By the way I noticed that RecoveryState is exactly a subset of
DBState. And changes of SharedRecoveryState happens side-by-side with
ControlFileData->state in most places. Coundn't we just usee
ControlFile->state instead of SharedRecoveryState?

By the way I found a typo.

+# Recovery tests for the achiving with a standby partially check
s/achiving/archiving/

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2020-04-20 07:34:44 Re: [BUG] non archived WAL removed during production crash recovery
Previous Message David G. Johnston 2020-04-20 05:32:24 Re: BUG #16381: create table in pgAdmin4 but unable to use it in SQL

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-04-20 07:15:40 Re: 001_rep_changes.pl stalls
Previous Message Noah Misch 2020-04-20 07:02:15 Re: 001_rep_changes.pl stalls