Re: [Patch] ALTER SYSTEM READ ONLY

From: Amul Sul <sulamul(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Prabhat Sahu <prabhat(dot)sahu(at)enterprisedb(dot)com>
Subject: Re: [Patch] ALTER SYSTEM READ ONLY
Date: 2021-09-15 10:49:17
Message-ID: CAAJ_b97vxaTA-GZK4AyCJvrpzUsAhGLugXaYrE7mybb4FmOarw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

, On Sat, Jul 24, 2021 at 1:33 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Thu, Jun 17, 2021 at 1:23 AM Amul Sul <sulamul(at)gmail(dot)com> wrote:
> > Attached is rebase for the latest master head. Also, I added one more
> > refactoring code that deduplicates the code setting database state in the
> > control file. The same code set the database state is also needed for this
> > feature.
>
> I started studying 0001 today and found that it rearranged the order
> of operations in StartupXLOG() more than I was expecting. It does, as
> per previous discussions, move a bunch of things to the place where we
> now call XLogParamters(). But, unsatisfyingly, InRecovery = false and
> XLogReaderFree() then have to move down even further. Since the goal
> here is to get to a situation where we sometimes XLogAcceptWrites()
> after InRecovery = false, it didn't seem nice for this refactoring
> patch to still end up with a situation where this stuff happens while
> InRecovery = true. In fact, with the patch, the amount of code that
> runs with InRecovery = true actually *increases*, which is not what I
> think should be happening here. That's why the patch ends up having to
> adjust SetMultiXactIdLimit to not Assert(!InRecovery).
>
> And then I started to wonder how this was ever going to work as part
> of the larger patch set, because as you have it here,
> XLogAcceptWrites() takes arguments XLogReaderState *xlogreader,
> XLogRecPtr EndOfLog, and TimeLineID EndOfLogTLI and if the
> checkpointer is calling that at a later time after the user issues
> pg_prohibit_wal(false), it's going to have none of those things. So I
> had a quick look at that part of the code and found this in
> checkpointer.c:
>
> XLogAcceptWrites(true, NULL, InvalidXLogRecPtr, 0);
>
> For those following along from home, the additional "true" is a bool
> needChkpt argument added to XLogAcceptWrites() by 0003. Well, none of
> this is very satisfying. The whole purpose of passing the xlogreader
> is so we can figure out whether we need a checkpoint (never mind the
> question of whether the existing algorithm for determining that is
> really sensible) but now we need a second argument that basically
> serves the same purpose since one of the two callers to this function
> won't have an xlogreader. And then we're passing the EndOfLog and
> EndOfLogTLI as dummy values which seems like it's probably just
> totally wrong, but if for some reason it works correctly there sure
> don't seem to be any comments explaining why.
>
> So I started doing a bit of hacking myself and ended up with the
> attached, which I think is not completely the right thing yet but I
> think it's better than your version. I split this into three parts.
> 0001 splits up the logic that currently decides whether to write an
> end-of-recovery record or a checkpoint record and if the latter how
> the checkpoint ought to be performed into two functions.
> DetermineRecoveryXlogAction() figures out what we want to do, and
> PerformRecoveryXlogAction() does it. It also moves the code to run
> recovery_end_command and related stuff into a new function
> CleanupAfterArchiveRecovery(). 0002 then builds on this by postponing
> UpdateFullPageWrites(), PerformRecoveryXLogAction(), and
> CleanupAfterArchiveRecovery() to just before we
> XLogReportParameters(). Because of the refactoring done by 0001, this
> is only a small amount of code movement. Because of the separation
> between DetermineRecoveryXlogAction() and PerformRecoveryXlogAction(),
> the latter doesn't need the xlogreader. So we can do
> DetermineRecoveryXlogAction() at the same time as now, while the
> xlogreader is available, and then we don't need it later when we
> PerformRecoveryXlogAction(), because we already know what we need to
> know. I think this is all fine as far as it goes.
>
> My 0003 is where I see some lingering problems. It creates
> XLogAcceptWrites(), moves the appropriate stuff there, and doesn't
> need the xlogreader. But it doesn't really solve the problem of how
> checkpointer.c would be able to call this function with proper
> arguments. It is at least better in not needing two arguments to
> decide what to do, but how is checkpointer.c supposed to know what to
> pass for xlogaction? Worse yet, how is checkpointer.c supposed to know
> what to pass for EndOfLogTLI and EndOfLog? Actually, EndOfLog doesn't
> seem too problematic, because that value has been stored in four (!)
> places inside XLogCtl by this code:
>
> LogwrtResult.Write = LogwrtResult.Flush = EndOfLog;
>
> XLogCtl->LogwrtResult = LogwrtResult;
>
> XLogCtl->LogwrtRqst.Write = EndOfLog;
> XLogCtl->LogwrtRqst.Flush = EndOfLog;
>
> Presumably we could relatively easily change things around so that we
> finish one of those values ... probably one of the "write" values ..
> back out of XLogCtl instead of passing it as a parameter. That would
> work just as well from the checkpointer as from the startup process,
> and there seems to be no way for the value to change until after
> XLogAcceptWrites() has been called, so it seems fine. But that doesn't
> help for the other arguments. What I'm thinking is that we should just
> arrange to store EndOfLogTLI and xlogaction into XLogCtl also, and
> then XLogAcceptWrites() can fish those values out of there as well,
> which should be enough to make it work and do the same thing
> regardless of which process is calling it. But I have run out of time
> for today so have not explored coding that up.
>

I have spent some time thinking about making XLogAcceptWrites()
independent, and for that, we need to get rid of its arguments which
are available only in the startup process. The first argument
xlogaction deduced by the DetermineRecoveryXlogAction(). If we are able to
make this function logic independent and can deduce that xlogaction in
any process, we can skip xlogaction argument passing.

DetermineRecoveryXlogAction() function depends on a few global
variables, valid only in the startup process are InRecovery,
ArchiveRecoveryRequested & LocalPromoteIsTriggered. Out of
three LocalPromoteIsTriggered's value is already available in shared
memory and that can be fetched by calling LocalPromoteIsTriggered().
InRecovery's value can be guessed by as long as DBState in the control
file doesn't get changed unexpectedly until XLogAcceptWrites()
executes. If the DBState was not a clean shutdown, then surely the
server has gone through recovery. If we could rely on DBState in the
control file then we are good to go. For the last one,
ArchiveRecoveryRequested, I don't see any existing and appropriate
shared memory or control file information, so that can be identified
if the archive recovery was requested or not. Initially, I thought to
use SharedRecoveryState which is always set to RECOVERY_STATE_ARCHIVE,
if the archive recovery requested. But there is another case where
SharedRecoveryState could be RECOVERY_STATE_ARCHIVE irrespective of
ArchiveRecoveryRequested value, that is the presence of a backup label
file. If we want to use SharedRecoveryState, we need one more state
which could differentiate between ArchiveRecoveryRequested and the
backup label file presence case. To move ahead, I have copied
ArchiveRecoveryRequested into shared memory and it will be cleared
once archive cleanup is finished. With all these changes, we could get
rid of xlogaction argument and DetermineRecoveryXlogAction() function.
Could move its logic to PerformRecoveryXLogAction() directly.

Now, the remaining two arguments of XLogAcceptWrites() are required
for the CleanupAfterArchiveRecovery() function. Along with these two
arguments, this function requires ArchiveRecoveryRequested and
ThisTimeLineID which are again global variables. With the previous
changes, we have got ArchiveRecoveryRequested into shared memory.
And for ThisTimeLineID, I don't think we need to do anything since this
value is available with all the backend as per the following comment:

"
/*
* ThisTimeLineID will be same in all backends --- it identifies current
* WAL timeline for the database system.
*/
TimeLineID ThisTimeLineID = 0;
"

In addition to the four places that Robert has pointed for EndOfLog,
XLogCtl->lastSegSwitchLSN also holds EndOfLog value and that doesn't
seem to change until WAL write is enabled. For EndOfLogTLI, I think we
can safely use XLogCtl->replayEndTLI. Currently, The EndOfLogTLI is
the timeline ID of the last record that xlogreader reads, but this
xlogreader was simply re-fetching the last record which we have
replied in redo loop if it was in recovery, if not in recovery, we
don't need to worry since this value is needed only in case of
ArchiveRecoveryRequested = true, which implicitly forces redo and sets
replayEndTLI value.

With all the above changes XLogAcceptWrites() can be called from other
processes but I haven't tested that. This finding is still not
complete and not too clean, perhaps, posting the patches with
aforesaid changes just to confirm the direction and forward the
discussion, thanks.

Regards,
Amul

Attachment Content-Type Size
v34-0001-Refactor-some-end-of-recovery-code-out-of-Startu.patch application/x-patch 13.4 KB
v34-0003-Create-XLogAcceptWrites-function-with-code-from-.patch application/x-patch 4.2 KB
v34-0002-Postpone-some-end-of-recovery-operations-relatin.patch application/x-patch 3.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2021-09-15 10:59:17 Re: Trigger position
Previous Message bucoo@sohu.com 2021-09-15 10:34:01 Re: Re: parallel distinct union and aggregate support patch