Re: [Patch] ALTER SYSTEM READ ONLY

From: Amul Sul <sulamul(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Patch] ALTER SYSTEM READ ONLY
Date: 2021-03-04 17:32:27
Message-ID: CAAJ_b95q6iBqgTpWBp937cLwVVvMQ63t1QuCzidqCry_QpMm_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 3, 2021 at 8:56 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Tue, Mar 2, 2021 at 7:22 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > Why do we need to move promote related code in XLogAcceptWrites?
> > IMHO, this promote related handling should be in StartupXLOG only.
> > That will look cleaner.
>
> The key design question here, at least in my mind, is what exactly
> happens after prohibit-WAL + system-crash + recovery-finishes. We
> clearly can't write the checkpoint or end-of-recovery record and
> proceed with business as usual, but are we still in recovery? Either
> (1) we are technically still in recovery, stopping just short of
> entering normal running, and will emerge from recovery when WAL is
> permitted again; or (2) we have technically finished recovery, but
> deferred some of the actions that would normally occur at that time
> until a later point. Maybe this is academic distinction as much as
> anything, but the idea is if we choose #1 then we should do as little
> as possible at the point when recovery finishes and defer as much as
> possible until we actually enter normal running; whereas if we choose
> #2 we should do as much as possible at the point when recovery
> finishes and defer only those things which absolutely have to be
> deferred. That said, I and I think also Andres are voting for #2.
>
> But if we go that way, that precludes what you are proposing here. If
> we picked #1 then it would be natural for the startup process to
> remain active and the control file update to be postponed until WAL
> writes are re-enabled; but under model #2 we want, if possible, for
> the startup process to exit and the control file update to happen
> normally, and only the writing of the actual WAL records to be
> deferred.
>

Current patch doing a mix of both, startup process exits without doing
WAL writes and control file updates, that happens later when system
changes to read-write.

> What I find much odder, looking at the present patch, is that
> PerformPendingStartupOperations() gets called from pg_prohibit_wal()
> rather than by the checkpointer. If the checkpointer is the process
> that is in charge of coordinating the change between a read-only state
> and a read-write state, then it ought to also do this. I also think
> that the PerformPendingStartupOperations() wrapper is unnecessary.
> Just invert the sense of the XLogCtl flag: xlogAllowWritesDone, rather
> than startupCrashRecoveryPending, and have XLogAcceptWrites() set it
> (and return without doing anything if it's already set). Then the
> checkpointer can just call the function unconditionally whenever we go
> read-write, and for a bonus we will have much better naming
> consistency, rather than calling the same thing "xlog accept writes"
> in one place, "pending startup operations" in another, and "startup
> crash recovery pending" in a third.
>

Ok, in the attached version, I have used the xlogAllowWritesDone variable.
To match the naming sense, it should be set to 'false' initially and
should get set to 'true' when the XLogAcceptWrites() operation completes.

I have removed the PerformPendingStartupOperations() wrapper function and I have
slightly changed XLogAcceptWrites() to minimize its parameter count so that it
can use available global variable values instead of parameters. Unfortunately,
it cannot be called from checkpointer unconditionally, it will create a race
with startup process when startup process still in recovery and checkpointer
launches and see that xlogAllowWritesDone = false, will go-ahead for those wal
write operations and end-of-recovery checkpoint which will be a
disaster. Therefore, I moved this XLogAcceptWrites() function inside
ProcessWALProhibitStateChangeRequest() and called when the system is in
GOING_READ_WRITE transition state. Since ProcessWALProhibitStateChangeRequest()
gets called from a different places of checkpointer process which creates a
cascaded call to XLogAcceptWrites() function, to avoid that I am updating
xlogAllowWritesDone = true immediately after it gets checked in
XLogAcceptWrites() which I think is not the right approach, technically, it
should be updated at the end of XLogAcceptWrites().

I think instead of xlogAllowWritesDone, we should use invert of it, as
the previous, e.g.xlogAllowWritesPending or xlogAllowWritesSkipped or something
else and that will be get explicitly set 'true' when we skip XLogAcceptWrites()
call. That will avoid the race of checkpointer process with the startup since
initially, it will be 'false', and if it is 'false' we will return immediately
from XLogAcceptWrites(). Also, we don't need to move XLogAcceptWrites() inside
ProcessWALProhibitStateChangeRequest(), it can be called from checkpointerMain()
loop which also avoids cascade calls and we don't need to update it until we
complete those write operations. Thoughts/Comments?

> Since this feature is basically no longer "alter system read only" but
> rather "pg_prohibit_wal" I think we also ought to rename the GUC,
> system_is_read_only -> wal_prohibited.
>

Done.

Regards,
Amul

Attachment Content-Type Size
v17-0002-Error-or-Assert-before-START_CRIT_SECTION-for-WA.patch application/x-patch 69.2 KB
v17-0003-WIP-Documentation.patch application/x-patch 10.3 KB
v17-0001-Implement-wal-prohibit-state-using-global-barrie.patch application/x-patch 51.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ibrar Ahmed 2021-03-04 17:37:23 Re: Corruption during WAL replay
Previous Message Robert Haas 2021-03-04 17:27:54 Re: pg_amcheck contrib application