Re: [Patch] ALTER SYSTEM READ ONLY

From: Amul Sul <sulamul(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Patch] ALTER SYSTEM READ ONLY
Date: 2020-09-15 09:05:39
Message-ID: CAAJ_b96-zeiZKQj5cqc1fx7TNLv726eT+KH6S5J9JesGHvcwdA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Andres,

The attached patch has fixed the issue that you have raised & I have confirmed
in my previous email. Also, I tried to improve some of the things that you have
pointed but for those changes, I am a little unsure and looking forward to the
inputs/suggestions/confirmation on that, therefore 0003 patch is marked WIP.

Please have a look at my inline reply below for the things that are changes in
the attached version and need inputs:

On Sat, Sep 12, 2020 at 10:52 AM Amul Sul <sulamul(at)gmail(dot)com> wrote:
>
> On Thu, Sep 10, 2020 at 2:33 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > Hi,
>
> Thanks for your time.
>
> >
> > Thomas, there's one point below that could be relevant for you. You can
> > search for your name and/or checkpoint...
> >
> >
> > On 2020-09-01 16:43:10 +0530, Amul Sul wrote:
> > > diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
> > > index 42050ab7195..0ac826d3c2f 100644
> > > --- a/src/backend/nodes/readfuncs.c
> > > +++ b/src/backend/nodes/readfuncs.c
> > > @@ -2552,6 +2552,19 @@ _readAlternativeSubPlan(void)
> > > READ_DONE();
> > > }
> > >
> > > +/*
> > > + * _readAlterSystemWALProhibitState
> > > + */
> > > +static AlterSystemWALProhibitState *
> > > +_readAlterSystemWALProhibitState(void)
> > > +{
> > > + READ_LOCALS(AlterSystemWALProhibitState);
> > > +
> > > + READ_BOOL_FIELD(WALProhibited);
> > > +
> > > + READ_DONE();
> > > +}
> > > +
> >
> > Why do we need readfuncs support for this?
> >
>
> I thought we need that from your previous comment[1].
>
> > > +
> > > +/*
> > > + * AlterSystemSetWALProhibitState
> > > + *
> > > + * Execute ALTER SYSTEM READ { ONLY | WRITE } statement.
> > > + */
> > > +static void
> > > +AlterSystemSetWALProhibitState(AlterSystemWALProhibitState *stmt)
> > > +{
> > > + /* some code */
> > > + elog(INFO, "AlterSystemSetWALProhibitState() called");
> > > +}
> >
> > As long as it's not implemented it seems better to return an ERROR.
> >
>
> Ok, will add an error in the next version.
>
> > > @@ -3195,6 +3195,16 @@ typedef struct AlterSystemStmt
> > > VariableSetStmt *setstmt; /* SET subcommand */
> > > } AlterSystemStmt;
> > >
> > > +/* ----------------------
> > > + * Alter System Read Statement
> > > + * ----------------------
> > > + */
> > > +typedef struct AlterSystemWALProhibitState
> > > +{
> > > + NodeTag type;
> > > + bool WALProhibited;
> > > +} AlterSystemWALProhibitState;
> > > +
> >
> > All the nearby fields use under_score_style names.
> >
>
> I am not sure which nearby fields having the underscore that you are referring
> to. Probably "WALProhibited" needs to be renamed to "walprohibited" to be
> inline with the nearby fields.
>
> >
> > > From f59329e4a7285c5b132ca74473fe88e5ba537254 Mon Sep 17 00:00:00 2001
> > > From: Amul Sul <amul(dot)sul(at)enterprisedb(dot)com>
> > > Date: Fri, 19 Jun 2020 06:29:36 -0400
> > > Subject: [PATCH v6 3/5] Implement ALTER SYSTEM READ ONLY using global barrier.
> > >
> > > Implementation:
> > >
> > > 1. When a user tried to change server state to WAL-Prohibited using
> > > ALTER SYSTEM READ ONLY command; AlterSystemSetWALProhibitState()
> > > raises request to checkpointer by marking current state to inprogress in
> > > shared memory. Checkpointer, noticing that the current state is has
> >
> > "is has"
> >
> > > WALPROHIBIT_TRANSITION_IN_PROGRESS flag set, does the barrier request, and
> > > then acknowledges back to the backend who requested the state change once
> > > the transition has been completed. Final state will be updated in control
> > > file to make it persistent across the system restarts.
> >
> > What makes checkpointer the right backend to do this work?
> >
>
> Once we've initiated the change to a read-only state, we probably want to
> always either finish that change or go back to read-write, even if the process
> that initiated the change is interrupted. Leaving the system in a
> half-way-in-between state long term seems bad. Maybe we would have put some
> background process, but choose the checkpointer in charge of making the state
> change and to avoid the new background process to keep the first version patch
> simple. The checkpointer isn't likely to get killed, but if it does, it will
> be relaunched and the new one can clean things up. Probably later we might want
> such a background worker that will be isn't likely to get killed.
>
> >
> > > 2. When a backend receives the WAL-Prohibited barrier, at that moment if
> > > it is already in a transaction and the transaction already assigned XID,
> > > then the backend will be killed by throwing FATAL(XXX: need more discussion
> > > on this)
> >
> >
> > > 3. Otherwise, if that backend running transaction which yet to get XID
> > > assigned we don't need to do anything special
> >
> > Somewhat garbled sentence...
> >
> >
> > > 4. A new transaction (from existing or new backend) starts as a read-only
> > > transaction.
> >
> > Maybe "(in an existing or in a new backend)"?
> >
> >
> > > 5. Autovacuum launcher as well as checkpointer will don't do anything in
> > > WAL-Prohibited server state until someone wakes us up. E.g. a backend
> > > might later on request us to put the system back to read-write.
> >
> > "will don't do anything", "might later on request us"
> >
>
> Ok, I'll fix all of this. I usually don't much focus on the commit message text
> but I try to make it as much as possible sane enough.
>
> >
> > > 6. At shutdown in WAL-Prohibited mode, we'll skip shutdown checkpoint
> > > and xlog rotation. Starting up again will perform crash recovery(XXX:
> > > need some discussion on this as well) but the end of recovery checkpoint
> > > will be skipped and it will be performed when the system changed to
> > > WAL-Permitted mode.
> >
> > Hm, this has some interesting interactions with some of Thomas' recent
> > hacking.
> >
>
> I would be so thankful for the help.
>
> >
> > > 8. Only super user can toggle WAL-Prohibit state.
> >
> > Hm. I don't quite agree with this. We try to avoid if (superuser())
> > style checks these days, because they can't be granted to other
> > users. Look at how e.g. pg_promote() - an operation of similar severity
> > - is handled. We just revoke the permission from public in
> > system_views.sql:
> > REVOKE EXECUTE ON FUNCTION pg_promote(boolean, integer) FROM public;
> >
>
> Ok, currently we don't have SQL callable function to change the system
> read-write state. Do you want me to add that? If so, any naming suggesting? How
> about pg_make_system_read_only(bool) or have two function as
> pg_make_system_read_only(void) & pg_make_system_read_write(void).
>

In the attached version I added SQL callable function as
pg_alter_wal_prohibit_state(bool), and another suggestion for the naming is
welcome.

For the permission denied error for ASRO READ-ONLY/READ-WRITE, I have added
ereport() in AlterSystemSetWALProhibitState() instead of aclcheck_error() and
the hint is added. Any suggestions?

> >
> > > 9. Add system_is_read_only GUC show the system state -- will true when system
> > > is wal prohibited or in recovery.
> >
> > *shows the system state. There's also some oddity in the second part of
> > the sentence.
> >
> > Is it really correct to show system_is_read_only as true during
> > recovery? For one, recovery could end soon after, putting the system
> > into r/w mode, if it wasn't actually ALTER SYSTEM READ ONLY'd. But also,
> > during recovery the database state actually changes if there are changes
> > to replay. ISTM it would not be a good idea to mix ASRO and
> > pg_is_in_recovery() into one GUC.
> >
>
> Well, whether the system is in recovery or wal prohibited state it is read-only
> for the user perspective, isn't it?
>
> >
> > > --- /dev/null
> > > +++ b/src/backend/access/transam/walprohibit.c
> > > @@ -0,0 +1,321 @@
> > > +/*-------------------------------------------------------------------------
> > > + *
> > > + * walprohibit.c
> > > + * PostgreSQL write-ahead log prohibit states
> > > + *
> > > + *
> > > + * Portions Copyright (c) 2020, PostgreSQL Global Development Group
> > > + *
> > > + * src/backend/access/transam/walprohibit.c
> > > + *
> > > + *-------------------------------------------------------------------------
> > > + */
> > > +#include "postgres.h"
> > > +
> > > +#include "access/walprohibit.h"
> > > +#include "pgstat.h"
> > > +#include "port/atomics.h"
> > > +#include "postmaster/bgwriter.h"
> > > +#include "storage/condition_variable.h"
> > > +#include "storage/procsignal.h"
> > > +#include "storage/shmem.h"
> > > +
> > > +/*
> > > + * Shared-memory WAL prohibit state
> > > + */
> > > +typedef struct WALProhibitStateData
> > > +{
> > > + /* Indicates current WAL prohibit state */
> > > + pg_atomic_uint32 SharedWALProhibitState;
> > > +
> > > + /* Startup checkpoint pending */
> > > + bool checkpointPending;
> > > +
> > > + /* Signaled when requested WAL prohibit state changes */
> > > + ConditionVariable walprohibit_cv;
> >
> > You're using three different naming styles for as many members.
> >
>
> Ill fix in the next version.
>
> >
> > > +/*
> > > + * ProcessBarrierWALProhibit()
> > > + *
> > > + * Handle WAL prohibit state change request.
> > > + */
> > > +bool
> > > +ProcessBarrierWALProhibit(void)
> > > +{
> > > + /*
> > > + * Kill off any transactions that have an XID *before* allowing the system
> > > + * to go WAL prohibit state.
> > > + */
> > > + if (FullTransactionIdIsValid(GetTopFullTransactionIdIfAny()))
> >
> > Hm. I wonder if this check is good enough. If you look at
> > RecordTransactionCommit() we also WAL log in some cases where no xid was
> > assigned. This is particularly true of (auto-)vacuum, but also for HOT
> > pruning.
> >
> > I think it'd be good to put the logic of this check into xlog.c and
> > mirror the logic in RecordTransactionCommit(). And add cross-referencing
> > comments to RecordTransactionCommit and the new function, reminding our
> > futures selves that both places need to be modified.
> >
>
> I am not sure I have understood this, here is the snip from the implementation
> detail from the first post[2]:
>
> "Open transactions that don't have an XID are not killed, but will get an ERROR
> if they try to acquire an XID later, or if they try to write WAL without
> acquiring an XID (e.g. VACUUM). To make that happen, the patch adds a new
> coding rule: a critical section that will write WAL must be preceded by a call
> to CheckWALPermitted(), AssertWALPermitted(), or AssertWALPermitted_HaveXID().
> The latter variants are used when we know for certain that inserting WAL here
> must be OK, either because we have an XID (we would have been killed by a change
> to read-only if one had occurred) or for some other reason."
>
> Do let me know if you want further clarification.
>
> >
> > > + {
> > > + /* Should be here only for the WAL prohibit state. */
> > > + Assert(GetWALProhibitState() & WALPROHIBIT_STATE_READ_ONLY);
> >
> > There are no races where an ASRO READ ONLY is quickly followed by ASRO
> > READ WRITE where this could be reached?
> >
>
> No, right now SetWALProhibitState() doesn't allow two transient wal prohibit
> states at a time.
>
> >
> > > +/*
> > > + * AlterSystemSetWALProhibitState()
> > > + *
> > > + * Execute ALTER SYSTEM READ { ONLY | WRITE } statement.
> > > + */
> > > +void
> > > +AlterSystemSetWALProhibitState(AlterSystemWALProhibitState *stmt)
> > > +{
> > > + uint32 state;
> > > +
> > > + if (!superuser())
> > > + ereport(ERROR,
> > > + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> > > + errmsg("must be superuser to execute ALTER SYSTEM command")));
> >
> > See comments about this above.
> >
> >
> > > + /* Alter WAL prohibit state not allowed during recovery */
> > > + PreventCommandDuringRecovery("ALTER SYSTEM");
> > > +
> > > + /* Requested state */
> > > + state = stmt->WALProhibited ?
> > > + WALPROHIBIT_STATE_READ_ONLY : WALPROHIBIT_STATE_READ_WRITE;
> > > +
> > > + /*
> > > + * Since we yet to convey this WAL prohibit state to all backend mark it
> > > + * in-progress.
> > > + */
> > > + state |= WALPROHIBIT_TRANSITION_IN_PROGRESS;
> > > +
> > > + if (!SetWALProhibitState(state))
> > > + return; /* server is already in the desired state */
> > > +
> >
> > This use of bitmasks seems unnecessary to me. I'd rather have one param
> > for WALPROHIBIT_STATE_READ_ONLY / WALPROHIBIT_STATE_READ_WRITE and one
> > for WALPROHIBIT_TRANSITION_IN_PROGRESS
> >
>
> Ok.
>
> How about the new version of SetWALProhibitState function as :
> SetWALProhibitState(bool wal_prohibited, bool is_final_state) ?
>

I have added the same.

> >
> >
> > > +/*
> > > + * RequestWALProhibitChange()
> > > + *
> > > + * Request checkpointer to make the WALProhibitState to read-only.
> > > + */
> > > +static void
> > > +RequestWALProhibitChange(void)
> > > +{
> > > + /* Must not be called from checkpointer */
> > > + Assert(!AmCheckpointerProcess());
> > > + Assert(GetWALProhibitState() & WALPROHIBIT_TRANSITION_IN_PROGRESS);
> > > +
> > > + /*
> > > + * If in a standalone backend, just do it ourselves.
> > > + */
> > > + if (!IsPostmasterEnvironment)
> > > + {
> > > + CompleteWALProhibitChange(GetWALProhibitState());
> > > + return;
> > > + }
> > > +
> > > + send_signal_to_checkpointer(SIGINT);
> > > +
> > > + /* Wait for the state to change to read-only */
> > > + ConditionVariablePrepareToSleep(&WALProhibitState->walprohibit_cv);
> > > + for (;;)
> > > + {
> > > + /* We'll be done once in-progress flag bit is cleared */
> > > + if (!(GetWALProhibitState() & WALPROHIBIT_TRANSITION_IN_PROGRESS))
> > > + break;
> > > +
> > > + ConditionVariableSleep(&WALProhibitState->walprohibit_cv,
> > > + WAIT_EVENT_WALPROHIBIT_STATE_CHANGE);
> > > + }
> > > + ConditionVariableCancelSleep();
> >
> > What if somebody concurrently changes the state back to READ WRITE?
> > Won't we unnecessarily wait here?
> >
>
> Yes, there will be wait.
>
> > That's probably fine, because we would just wait until that transition
> > is complete too. But at least a comment about that would be
> > good. Alternatively a "ASRO transitions completed counter" or such might
> > be a better idea?
> >
>
> Ok, will add comments but could you please elaborate little a bit about "ASRO
> transitions completed counter" and is there any existing counter I can refer
> to?
>
> >
> > > +/*
> > > + * CompleteWALProhibitChange()
> > > + *
> > > + * Checkpointer will call this to complete the requested WAL prohibit state
> > > + * transition.
> > > + */
> > > +void
> > > +CompleteWALProhibitChange(uint32 wal_state)
> > > +{
> > > + uint64 barrierGeneration;
> > > +
> > > + /*
> > > + * Must be called from checkpointer. Otherwise, it must be single-user
> > > + * backend.
> > > + */
> > > + Assert(AmCheckpointerProcess() || !IsPostmasterEnvironment);
> > > + Assert(wal_state & WALPROHIBIT_TRANSITION_IN_PROGRESS);
> > > +
> > > + /*
> > > + * WAL prohibit state change is initiated. We need to complete the state
> > > + * transition by setting requested WAL prohibit state in all backends.
> > > + */
> > > + elog(DEBUG1, "waiting for backends to adopt requested WAL prohibit state");
> > > +
> > > + /* Emit global barrier */
> > > + barrierGeneration = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_WALPROHIBIT);
> > > + WaitForProcSignalBarrier(barrierGeneration);
> > > +
> > > + /* And flush all writes. */
> > > + XLogFlush(GetXLogWriteRecPtr());
> >
> > Hm, maybe I'm missing something, but why is the write pointer the right
> > thing to flush? That won't include records that haven't been written to
> > disk yet... We also need to trigger writing out all WAL that is as of
> > yet unwritten, no? Without having thought a lot about it, it seems that
> > GetXLogInsertRecPtr() would be the right thing to flush?
> >
>
> TBH, I am not an expert in this area. I wants to flush the latest record
> pointer that needs to be flushed, I think GetXLogInsertRecPtr() would be fine
> if is the latest one. Note that wal flushes are not blocked in read-only mode.
>

Used GetXLogInsertRecPtr().

> >
> > > + /* Set final state by clearing in-progress flag bit */
> > > + if (SetWALProhibitState(wal_state & ~(WALPROHIBIT_TRANSITION_IN_PROGRESS)))
> > > + {
> > > + bool wal_prohibited;
> > > +
> > > + wal_prohibited = (wal_state & WALPROHIBIT_STATE_READ_ONLY) != 0;
> > > +
> > > + /* Update the control file to make state persistent */
> > > + SetControlFileWALProhibitFlag(wal_prohibited);
> >
> > Hm. Is there an issue with not WAL logging the control file change? Is
> > there a scenario where we a crash + recovery would end up overwriting
> > this?
> >
>
> I am not sure. If the system crash before update this that means we haven't
> acknowledged the system state change. And the server will be restarted with the
> previous state.
>
> Could you please explain what bothering you.
>
> >
> > > + if (wal_prohibited)
> > > + ereport(LOG, (errmsg("system is now read only")));
> > > + else
> > > + {
> > > + /*
> > > + * Request checkpoint if the end-of-recovery checkpoint has been
> > > + * skipped previously.
> > > + */
> > > + if (WALProhibitState->checkpointPending)
> > > + {
> > > + RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
> > > + CHECKPOINT_IMMEDIATE);
> > > + WALProhibitState->checkpointPending = false;
> > > + }
> > > + ereport(LOG, (errmsg("system is now read write")));
> > > + }
> > > + }
> > > +
> > > + /* Wake up the backend who requested the state change */
> > > + ConditionVariableBroadcast(&WALProhibitState->walprohibit_cv);
> >
> > Could be multiple backends, right?
> >
>
> Yes, you are correct, will fix that.
>
> >
> > > +}
> > > +
> > > +/*
> > > + * GetWALProhibitState()
> > > + *
> > > + * Atomically return the current server WAL prohibited state
> > > + */
> > > +uint32
> > > +GetWALProhibitState(void)
> > > +{
> > > + return pg_atomic_read_u32(&WALProhibitState->SharedWALProhibitState);
> > > +}
> >
> > Is there an issue with needing memory barriers here?
> >
> >
> > > +/*
> > > + * SetWALProhibitState()
> > > + *
> > > + * Change current WAL prohibit state to the input state.
> > > + *
> > > + * If the server is already completely moved to the requested WAL prohibit
> > > + * state, or if the desired state is same as the current state, return false,
> > > + * indicating that the server state did not change. Else return true.
> > > + */
> > > +bool
> > > +SetWALProhibitState(uint32 new_state)
> > > +{
> > > + bool state_updated = false;
> > > + uint32 cur_state;
> > > +
> > > + cur_state = GetWALProhibitState();
> > > +
> > > + /* Server is already in requested state */
> > > + if (new_state == cur_state ||
> > > + new_state == (cur_state | WALPROHIBIT_TRANSITION_IN_PROGRESS))
> > > + return false;
> > > +
> > > + /* Prevent concurrent contrary in progress transition state setting */
> > > + if ((new_state & WALPROHIBIT_TRANSITION_IN_PROGRESS) &&
> > > + (cur_state & WALPROHIBIT_TRANSITION_IN_PROGRESS))
> > > + {
> > > + if (cur_state & WALPROHIBIT_STATE_READ_ONLY)
> > > + ereport(ERROR,
> > > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > > + errmsg("system state transition to read only is already in progress"),
> > > + errhint("Try after sometime again.")));
> > > + else
> > > + ereport(ERROR,
> > > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > > + errmsg("system state transition to read write is already in progress"),
> > > + errhint("Try after sometime again.")));
> > > + }
> > > +
> > > + /* Update new state in share memory */
> > > + state_updated =
> > > + pg_atomic_compare_exchange_u32(&WALProhibitState->SharedWALProhibitState,
> > > + &cur_state, new_state);
> > > +
> > > + if (!state_updated)
> > > + ereport(ERROR,
> > > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > > + errmsg("system read write state concurrently changed"),
> > > + errhint("Try after sometime again.")));
> > > +
> >
> > I don't think it's safe to use pg_atomic_compare_exchange_u32() outside
> > of a loop. I think there's platforms (basically all load-linked /
> > store-conditional architectures) where than can fail spuriously.
> >
> > Also, there's no memory barrier around GetWALProhibitState, so there's
> > no guarantee it's not an out-of-date value you're starting with.
> >
>
> How about having some kind of lock instead what Robert have suggested
> previously[3] ?
>

I would like to discuss this point more. In the attached version I have added
WALProhibitLock to protect shared walprohibit state updates. I was a little
unsure do we want another spinlock what XLogCtlData has which is mostly used to
read the shared variable and for the update, both are used e.g. LogwrtResult.

Right now I haven't added and shared walprohibit state was fetch using a
volatile pointer. Do we need a spinlock there, I am not sure why? Thoughts?

> >
> > > +/
> > > + * MarkCheckPointSkippedInWalProhibitState()
> > > + *
> > > + * Sets checkpoint pending flag so that it can be performed next time while
> > > + * changing system state to WAL permitted.
> > > + */
> > > +void
> > > +MarkCheckPointSkippedInWalProhibitState(void)
> > > +{
> > > + WALProhibitState->checkpointPending = true;
> > > +}
> >
> > I don't *at all* like this living outside of xlog.c. I think this should
> > be moved there, and merged with deferring checkpoints in other cases
> > (promotions, not immediately performing a checkpoint after recovery).
>
> Here we want to perform the checkpoint sometime quite later when the
> system state changes to read-write. For that, I think we need some flag
> if we want this in xlog.c then we can have that flag in XLogCtl.
>

Right now I have added a new variable to XLogCtlData and moved this code to
xlog.c.

>
> > There's state in ControlFile *and* here for essentially the same thing.
> >
>
> I am sorry to trouble you much, but I haven't understood this too.
>
> >
> >
> > > + * If it is not currently possible to insert write-ahead log records,
> > > + * either because we are still in recovery or because ALTER SYSTEM READ
> > > + * ONLY has been executed, force this to be a read-only transaction.
> > > + * We have lower level defences in XLogBeginInsert() and elsewhere to stop
> > > + * us from modifying data during recovery when !XLogInsertAllowed(), but
> > > + * this gives the normal indication to the user that the transaction is
> > > + * read-only.
> > > + *
> > > + * On the other hand, we only need to set the startedInRecovery flag when
> > > + * the transaction started during recovery, and not when WAL is otherwise
> > > + * prohibited. This information is used by RelationGetIndexScan() to
> > > + * decide whether to permit (1) relying on existing killed-tuple markings
> > > + * and (2) further killing of index tuples. Even when WAL is prohibited
> > > + * on the master, it's still the master, so the former is OK; and since
> > > + * killing index tuples doesn't generate WAL, the latter is also OK.
> > > + * See comments in RelationGetIndexScan() and MarkBufferDirtyHint().
> > > + */
> > > + XactReadOnly = DefaultXactReadOnly || !XLogInsertAllowed();
> > > + s->startedInRecovery = RecoveryInProgress();
> >
> > It's somewhat ugly that we call RecoveryInProgress() once in
> > XLogInsertAllowed() and then again directly here... It's probably fine
> > runtime cost wise, but...
> >
> >
> > > /*
> > > * Subroutine to try to fetch and validate a prior checkpoint record.
> > > *
> > > @@ -8508,9 +8564,13 @@ ShutdownXLOG(int code, Datum arg)
> > > */
> > > WalSndWaitStopping();
> > >
> > > + /*
> > > + * The restartpoint, checkpoint, or xlog rotation will be performed if the
> > > + * WAL writing is permitted.
> > > + */
> > > if (RecoveryInProgress())
> > > CreateRestartPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
> > > - else
> > > + else if (XLogInsertAllowed())
> >
> > Not sure I like going via XLogInsertAllowed(), that seems like a
> > confusing indirection here. And it encompasses things we atually don't
> > want to check for - it's fragile to also look at LocalXLogInsertAllowed
> > here imo.
> >
> >
> > > ShutdownCLOG();
> > > ShutdownCommitTs();
> > > ShutdownSUBTRANS();
> > > diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
> > > index 1b8cd7bacd4..aa4cdd57ec1 100644
> > > --- a/src/backend/postmaster/autovacuum.c
> > > +++ b/src/backend/postmaster/autovacuum.c
> > > @@ -652,6 +652,10 @@ AutoVacLauncherMain(int argc, char *argv[])
> > >
> > > HandleAutoVacLauncherInterrupts();
> > >
> > > + /* If the server is read only just go back to sleep. */
> > > + if (!XLogInsertAllowed())
> > > + continue;
> > > +
> >
> > I think we really should have a different functions for places like
> > this. We don't want to generally hide bugs like e.g. starting the
> > autovac launcher in recovery, but this would.
> >
>
> So, we need a separate function like XLogInsertAllowed() and a global variable
> like LocalXLogInsertAllowed for the caching wal prohibit state.
>
> >
> > > @@ -342,6 +344,28 @@ CheckpointerMain(void)
> > > AbsorbSyncRequests();
> > > HandleCheckpointerInterrupts();
> > >
> > > + wal_state = GetWALProhibitState();
> > > +
> > > + if (wal_state & WALPROHIBIT_TRANSITION_IN_PROGRESS)
> > > + {
> > > + /* Complete WAL prohibit state change request */
> > > + CompleteWALProhibitChange(wal_state);
> > > + continue;
> > > + }
> > > + else if (wal_state & WALPROHIBIT_STATE_READ_ONLY)
> > > + {
> > > + /*
> > > + * Don't do anything until someone wakes us up. For example a
> > > + * backend might later on request us to put the system back to
> > > + * read-write wal prohibit sate.
> > > + */
> > > + (void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, -1,
> > > + WAIT_EVENT_CHECKPOINTER_MAIN);
> > > + continue;
> > > + }
> > > + Assert(wal_state == WALPROHIBIT_STATE_READ_WRITE);
> > > +
> > > /*
> > > * Detect a pending checkpoint request by checking whether the flags
> > > * word in shared memory is nonzero. We shouldn't need to acquire the
> > > @@ -1323,3 +1347,16 @@ FirstCallSinceLastCheckpoint(void)
> > >
> > > return FirstCall;
> > > }
> >
> > So, if we're in the middle of a paced checkpoint with a large
> > checkpoint_timeout - a sensible real world configuration - we'll not
> > process ASRO until that checkpoint is over? That seems very much not
> > practical. What am I missing?
> >
>
> Yes, the process doing ASRO will wait until that checkpoint is over.
>
> >
> > > +/*
> > > + * send_signal_to_checkpointer allows a process to send a signal to the checkpoint process.
> > > + */
> > > +void
> > > +send_signal_to_checkpointer(int signum)
> > > +{
> > > + if (CheckpointerShmem->checkpointer_pid == 0)
> > > + elog(ERROR, "checkpointer is not running");
> > > +
> > > + if (kill(CheckpointerShmem->checkpointer_pid, signum) != 0)
> > > + elog(ERROR, "could not signal checkpointer: %m");
> > > +}
> >
> > Sudden switch to a different naming style...
> >
>
> My bad, sorry, will fix that.
>
> 1] http://postgr.es/m/20200724020402.2byiiufsd7pw4hsp@alap3.anarazel.de
> 2] http://postgr.es/m/CAAJ_b97KZzdJsffwRK7w0XU5HnXkcgKgTR69t8cOZztsyXjkQw@mail.gmail.com
> 3] http://postgr.es/m/CA+TgmoYMyw-m3O5XQ8tRy4mdEArGcfXr+9niO5Fmq1wVdKxYmQ@mail.gmail.com

Thank you !

Regards,
Amul

Attachment Content-Type Size
v7-0004-Error-or-Assert-before-START_CRIT_SECTION-for-WAL.patch application/x-patch 70.8 KB
v7-0003-WIP-Implement-ALTER-SYSTEM-READ-ONLY-using-global.patch application/x-patch 40.5 KB
v7-0002-Add-alter-system-read-only-write-syntax.patch application/x-patch 9.3 KB
v7-0001-Allow-error-or-refusal-while-absorbing-barriers.patch application/x-patch 4.4 KB
v7-0005-WIP-Documentation.patch application/x-patch 6.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-09-15 09:19:17 Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away
Previous Message Matthias van de Meent 2020-09-15 09:03:14 Re: [patch] _bt_binsrch* improvements - equal-prefix-skip binary search