Re: Split xlog.c

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, vignesh C <vignesh21(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Split xlog.c
Date: 2021-11-22 23:10:36
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here's a new version. It includes two new smaller commits, before the
main refactoring:

1. Refactor setting XLP_FIRST_IS_OVERWRITE_CONTRECORD. I moved the code
to set that flag from AdvanceXLInsertBuffer() into
CreateOverwriteContrecordRecord(). That avoids the need for accessing
the global variable in AdvanceXLInsertBuffer(), which is nice with this
patch set because I moved the global variables into xlogrecord.c. For
comparison, when we are writing a continuation record, the
XLP_FIRST_IS_CONTRECORD flag is also set by the caller,
CopyXLogRecordToWAL(), not AdvanceXLInsertBuffer() itself. So I think
this is marginally more clear anyway.

2. Use correct WAL position in error message on invalid XLOG page
header. This is the thing that Robert pointed out in the "xlog.c:
removing ReadRecPtr and EndRecPtr" thread. I needed to make the change
for the refactoring anyway, but since it's a minor bug fix, it seemed
better to extract it to a separate commit, after all.

Responses to Robert's comments below:

On 20/10/2021 22:06, Robert Haas wrote:
> - Some logic to (a) sanity-check the control file's REDO pointer, (b)
> set InRecovery = true, and (c) update various bits of control file
> state in memory has been moved substantially earlier. The actual
> update of the control file on disk stays where it was before. At least
> on first reading, I don't really like this. On the one hand, I don't
> see a reason why it's necessary prerequisite for splitting xlog.c. On
> the other hand, it seems a bit dangerous.

The new contents of the control file are determined by the checkpoint
record, presence of backup label file, and whether we're doing archive
recovery. We have that information at hand in InitWalRecovery(), whereas
the caller doesn't know or care whether a backup label file was present,
for example. That's why I wanted to move that logic to InitWalRecovery().

However, I was afraid of moving the actual call to UpdateControlFile()
there. That would be a bigger behavioral change. What if initializing
one of the subsystems fails? Currently, the control file is left
unchanged, but if we called UpdateControlFile() earlier, then it would
be modified already.

> There's now ~8 calls to functions in other modules between the time
> you change things in memory and the time that you call
> UpdateControlFile(). Perhaps none of those functions can call
> anything that might in turn call UpdateControlFile() but I don't know
> why we should take the chance. Is there some advantage to having the
> in-memory state out of sync with the on-disk state across all that
> code?
The functions that get called in between don't call UpdateControlFile()
and don't affect what gets written there. It would be pretty
questionable if they did, even on master. But for the sake of the
argument, let's see what would happen if they did:

master: The later call to UpdateControlFile() writes out the same values
again. Unless the changed field was one of the following: 'state',
'checkPoint', 'checkPointCopy', 'minRecoveryPoint',
'minRecoveryPointTLI', 'backupStartPoint', 'backupEndRequired' or
'time'. If it was one of those, then it may be overwritten with the
values deduced from the starting checkpoint.

After these patches: The later call to UpdateControlFile() writes out
the same values again, even if it was one of those fields.

Seems like a wash to me. It's hard to tell which behavior would be the
correct one.

On 'master', InRecovery might or might not already be set when we call
those functions. It is already set if there was a backup label file, but
if we're doing recover for any other reason, it's set only later. That's
pretty sloppy. We check InRecovery in various assertions, and it affects
whether UpdateMinRecoveryPoint() updates the control file or not, among
other things. With these patches, InRecovery is always set at that point
(or not, if recovery is not needed). That's a bit besides the point
here, but it highlights that the current coding isn't very robust either
if those startup functions tried to modify the control file. I think
these patches make it a little better, or at least not worse.

> - The code to clear InArchiveRecovery and close the WAL segment we had
> open moves earlier. I think it might be possible to fail
> Assert(InArchiveRecovery), because where you've moved this code, we
> haven't yet verified that we reached the minimum recovery point. See
> the comment which begins "It's possible that archive recovery was
> requested, but we don't know how far we need to replay the WAL before
> we reach consistency." What if we reach that point, then fail the big
> hairy if-test and don't set InArchiveRecovery = true? In that case, we
> can still do it later, in ReadRecord. But maybe that will never
> happen. Actually it's not entirely clear to me that the assertion is
> bulletproof even where it is right now, but moving it earlier makes me
> even less confident. Possibly I just don't understand this well
> enough.

Hmm, yeah, this logic is hairy. I tried to find a case where that
assertion would fail but couldn't find one. I believe it's correct, but
we could probably make it more clear.

In a nutshell, PerformWalRecovery() will never return, if
(ArchiveRecoveryRequested && !InArchiveRecovery). Why? There are two
ways that PerformWalRecovery() can return:

1. After reaching end of WAL. ReadRecord() will always always set
InArchiveRecovery in that case, if ArchiveRecoveryRequested was set. It
won't return NULL without doing that.

2. We reached the requested recovery target point. There's a check for
that case in PerformWalRecovery(), it will throw an "ERROR: requested
recovery stop point is before consistent recovery point" if that happens
before InArchiveRecovery is set. Because reachedConsistency isn't set
until crash recovery is finished.

That said, independently of this patch series, perhaps that assertion
should be changed into something like this:

if (ArchiveRecoveryRequested)
- Assert(InArchiveRecovery);
+ /*
+ * If archive recovery was requested, we should not finish
+ * recovery before starting archive recovery.
+ *
+ * There are other checks for this in PerformWalRecovery() so
+ * this shouldn't happen, but let's be safe.
+ */
+ if (!InArchiveRecovery)
+ elog(ERROR, "archive recovery was requested, but recovery
finished before it started");

> It's a little tempting, too, to see if you could somehow consolidate
> the two places that do if (readFile >= 0) { close(readFile); readFile
> = -1 } down to one.

Yeah, I thought about that, but couldn't find a nice way to do it.

> - getRecoveryStopReason() is now called earlier than before, and is
> now called whether or not ArchiveRecoveryRequested. This seems to
> just move the point of initialization further from the point of use
> to no real advantage, and I also think that the function is only
> designed to do something useful for archive recovery, so calling it
> in other cases just seems confusing.

On the other hand, it's now closer to the actual end-of-recovery. The
idea here is that it seems natural to return the reason that recovery
ended along with all the other end-of-recovery information, in the same
EndOfWalRecoveryInfo struct.

Kyotaro commented on the same thing and suggested keeping the call
getRecoveryStopReason() where it was. That'd require exposing
getRecoveryStopReason() from xlogrecovery.c. Which isn't a big deal, we
could do it, but in general I tried to minimize the surface area between
xlog.c and xlogrecovery.c. If getRecoveryStopReason() was a separate
function, should standby_signal_file_found and
recovery_signal_file_found also be separate functions? I'd prefer to
gather all the end-of-recovery information into one struct.

> That's all I have on 0001. Is this kind of review helpful?

Yes, very helpful, thank you!

- Heikki

Attachment Content-Type Size
v7-0001-Refactor-setting-XLP_FIRST_IS_OVERWRITE_CONTRECOR.patch text/x-patch 3.4 KB
v7-0002-Use-correct-WAL-position-in-error-message-on-inva.patch text/x-patch 1.7 KB
v7-0003-Move-code-around-in-StartupXLOG.patch text/x-patch 24.2 KB
v7-0004-Split-xlog.c-into-xlog.c-and-xlogrecovery.c.patch text/x-patch 323.5 KB
v7-0005-Move-code-to-apply-one-WAL-record-to-a-subroutine.patch text/x-patch 11.0 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2021-11-22 23:11:18 Re: Split xlog.c
Previous Message Tom Lane 2021-11-22 22:32:21 Re: Reduce function call costs on ELF platforms