Re: needless complexity in StartupXLOG

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: needless complexity in StartupXLOG
Date: 2021-07-26 19:53:20
Message-ID: CA+TgmoaKy9RRm-zaeFF=UCwR7J=R0xEbVvQ-WLR=0mAbOZ2P+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 26, 2021 at 1:32 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> Yeah, tend to agree with this too ... but something I find a bit curious
> is the comment:
>
> * Insert a special WAL record to mark the end of
> * recovery, since we aren't doing a checkpoint.
>
> ... immediately after setting promoted = true, and then at the end of
> StartupXLOG() having:
>
> if (promoted)
> RequestCheckpoint(CHECKPOINT_FORCE);
>
> maybe I'm missing something, but seems like that comment isn't being
> terribly clear. Perhaps we aren't doing a full checkpoint *there*, but
> sure looks like we're going to do one moments later regardless of
> anything else since we've set promoted to true..?

Yep. So it's a question of whether we allow operations that might
write WAL in the meantime. When we write the checkpoint record right
here, there can't be any WAL from the new server lifetime until the
checkpoint completes. When we write an end-of-recovery record, there
can. And there could actually be quite a bit, because if we do the
checkpoint right in this section of code, it will be a fast
checkpoint, whereas in the code you quoted above, it's a spread
checkpoint, which takes a lot longer. So the question is whether it's
reasonable to give the checkpoint some time to complete or whether it
needs to be completed right now.

> Agreed that simpler logic is better, provided it's correct logic, of
> course. Finding better ways to test all of this would be really nice.

Yeah, and there again, it's a lot easier to test if we don't have as
many cases. Now no single change is going to fix that, but the number
of flag variables in xlog.c is simply bonkers. This particular stretch
of code uses 3 of them to even decide whether to attempt the test in
question, and all of those are set in complex ways depending on the
values of still more flag variables. The comments where
bgwriterLaunched is set claim that we only do this during archive
recovery, not crash recovery, but the code depends on the value of
ArchiveRecoveryRequested, not InArchiveRecovery. So I wonder if we
can't get the bgwriter to run even during crash recovery in the
scenario described by:

* It's possible that archive recovery was requested, but we don't
* know how far we need to replay the WAL before we reach consistency.
* This can happen for example if a base backup is taken from a
* running server using an atomic filesystem snapshot, without calling
* pg_start/stop_backup. Or if you just kill a running primary server
* and put it into archive recovery by creating a recovery signal
* file.

If we ran the bgwriter all the time during crash recovery, we'd know
for sure whether that causes any problems. If we only do it like this
in certain corner cases, it's much more likely that we have bugs.
Grumble, grumble.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-07-26 19:56:01 Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Previous Message Alvaro Herrera 2021-07-26 19:35:29 Re: Some code cleanup for pgbench and pg_verifybackup