Re: [Patch] ALTER SYSTEM READ ONLY

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amul Sul <sulamul(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Prabhat Sahu <prabhat(dot)sahu(at)enterprisedb(dot)com>
Subject: Re: [Patch] ALTER SYSTEM READ ONLY
Date: 2021-07-23 20:03:25
Message-ID: CA+TgmoY65W8RNrXRSMWyVX3s66MDg+Kzkp2z7A4sAZjFF+igRA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 17, 2021 at 1:23 AM Amul Sul <sulamul(at)gmail(dot)com> wrote:
> Attached is rebase for the latest master head. Also, I added one more
> refactoring code that deduplicates the code setting database state in the
> control file. The same code set the database state is also needed for this
> feature.

I started studying 0001 today and found that it rearranged the order
of operations in StartupXLOG() more than I was expecting. It does, as
per previous discussions, move a bunch of things to the place where we
now call XLogParamters(). But, unsatisfyingly, InRecovery = false and
XLogReaderFree() then have to move down even further. Since the goal
here is to get to a situation where we sometimes XLogAcceptWrites()
after InRecovery = false, it didn't seem nice for this refactoring
patch to still end up with a situation where this stuff happens while
InRecovery = true. In fact, with the patch, the amount of code that
runs with InRecovery = true actually *increases*, which is not what I
think should be happening here. That's why the patch ends up having to
adjust SetMultiXactIdLimit to not Assert(!InRecovery).

And then I started to wonder how this was ever going to work as part
of the larger patch set, because as you have it here,
XLogAcceptWrites() takes arguments XLogReaderState *xlogreader,
XLogRecPtr EndOfLog, and TimeLineID EndOfLogTLI and if the
checkpointer is calling that at a later time after the user issues
pg_prohibit_wal(false), it's going to have none of those things. So I
had a quick look at that part of the code and found this in
checkpointer.c:

XLogAcceptWrites(true, NULL, InvalidXLogRecPtr, 0);

For those following along from home, the additional "true" is a bool
needChkpt argument added to XLogAcceptWrites() by 0003. Well, none of
this is very satisfying. The whole purpose of passing the xlogreader
is so we can figure out whether we need a checkpoint (never mind the
question of whether the existing algorithm for determining that is
really sensible) but now we need a second argument that basically
serves the same purpose since one of the two callers to this function
won't have an xlogreader. And then we're passing the EndOfLog and
EndOfLogTLI as dummy values which seems like it's probably just
totally wrong, but if for some reason it works correctly there sure
don't seem to be any comments explaining why.

So I started doing a bit of hacking myself and ended up with the
attached, which I think is not completely the right thing yet but I
think it's better than your version. I split this into three parts.
0001 splits up the logic that currently decides whether to write an
end-of-recovery record or a checkpoint record and if the latter how
the checkpoint ought to be performed into two functions.
DetermineRecoveryXlogAction() figures out what we want to do, and
PerformRecoveryXlogAction() does it. It also moves the code to run
recovery_end_command and related stuff into a new function
CleanupAfterArchiveRecovery(). 0002 then builds on this by postponing
UpdateFullPageWrites(), PerformRecoveryXLogAction(), and
CleanupAfterArchiveRecovery() to just before we
XLogReportParameters(). Because of the refactoring done by 0001, this
is only a small amount of code movement. Because of the separation
between DetermineRecoveryXlogAction() and PerformRecoveryXlogAction(),
the latter doesn't need the xlogreader. So we can do
DetermineRecoveryXlogAction() at the same time as now, while the
xlogreader is available, and then we don't need it later when we
PerformRecoveryXlogAction(), because we already know what we need to
know. I think this is all fine as far as it goes.

My 0003 is where I see some lingering problems. It creates
XLogAcceptWrites(), moves the appropriate stuff there, and doesn't
need the xlogreader. But it doesn't really solve the problem of how
checkpointer.c would be able to call this function with proper
arguments. It is at least better in not needing two arguments to
decide what to do, but how is checkpointer.c supposed to know what to
pass for xlogaction? Worse yet, how is checkpointer.c supposed to know
what to pass for EndOfLogTLI and EndOfLog? Actually, EndOfLog doesn't
seem too problematic, because that value has been stored in four (!)
places inside XLogCtl by this code:

LogwrtResult.Write = LogwrtResult.Flush = EndOfLog;

XLogCtl->LogwrtResult = LogwrtResult;

XLogCtl->LogwrtRqst.Write = EndOfLog;
XLogCtl->LogwrtRqst.Flush = EndOfLog;

Presumably we could relatively easily change things around so that we
finish one of those values ... probably one of the "write" values ..
back out of XLogCtl instead of passing it as a parameter. That would
work just as well from the checkpointer as from the startup process,
and there seems to be no way for the value to change until after
XLogAcceptWrites() has been called, so it seems fine. But that doesn't
help for the other arguments. What I'm thinking is that we should just
arrange to store EndOfLogTLI and xlogaction into XLogCtl also, and
then XLogAcceptWrites() can fish those values out of there as well,
which should be enough to make it work and do the same thing
regardless of which process is calling it. But I have run out of time
for today so have not explored coding that up.

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
0001-Refactor-some-end-of-recovery-code-out-of-StartupXLO.patch application/octet-stream 16.0 KB
0002-Postpone-some-end-of-recovery-operations-relating-to.patch application/octet-stream 3.3 KB
0003-Create-XLogAcceptWrites-function-with-code-from-Star.patch application/octet-stream 3.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2021-07-23 20:27:38 Re: Have I found an interval arithmetic bug?
Previous Message Andres Freund 2021-07-23 19:57:41 Re: Configure with thread sanitizer fails the thread test