Re: [Patch] ALTER SYSTEM READ ONLY

From: Amul Sul <sulamul(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Patch] ALTER SYSTEM READ ONLY
Date: 2020-09-23 06:04:41
Message-ID: CAAJ_b94bXooXcp7GBTq_nbpc9wCzNPMgwm6umjBz1E1B-urOeg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 15, 2020 at 2:35 PM Amul Sul <sulamul(at)gmail(dot)com> wrote:
>
> Hi Andres,
>
> The attached patch has fixed the issue that you have raised & I have confirmed
> in my previous email. Also, I tried to improve some of the things that you have
> pointed but for those changes, I am a little unsure and looking forward to the
> inputs/suggestions/confirmation on that, therefore 0003 patch is marked WIP.
>
> Please have a look at my inline reply below for the things that are changes in
> the attached version and need inputs:
>
> On Sat, Sep 12, 2020 at 10:52 AM Amul Sul <sulamul(at)gmail(dot)com> wrote:
> >
> > On Thu, Sep 10, 2020 at 2:33 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > >
[... Skipped ....]
> > >
> > >
> > > > +/*
> > > > + * RequestWALProhibitChange()
> > > > + *
> > > > + * Request checkpointer to make the WALProhibitState to read-only.
> > > > + */
> > > > +static void
> > > > +RequestWALProhibitChange(void)
> > > > +{
> > > > + /* Must not be called from checkpointer */
> > > > + Assert(!AmCheckpointerProcess());
> > > > + Assert(GetWALProhibitState() & WALPROHIBIT_TRANSITION_IN_PROGRESS);
> > > > +
> > > > + /*
> > > > + * If in a standalone backend, just do it ourselves.
> > > > + */
> > > > + if (!IsPostmasterEnvironment)
> > > > + {
> > > > + CompleteWALProhibitChange(GetWALProhibitState());
> > > > + return;
> > > > + }
> > > > +
> > > > + send_signal_to_checkpointer(SIGINT);
> > > > +
> > > > + /* Wait for the state to change to read-only */
> > > > + ConditionVariablePrepareToSleep(&WALProhibitState->walprohibit_cv);
> > > > + for (;;)
> > > > + {
> > > > + /* We'll be done once in-progress flag bit is cleared */
> > > > + if (!(GetWALProhibitState() & WALPROHIBIT_TRANSITION_IN_PROGRESS))
> > > > + break;
> > > > +
> > > > + ConditionVariableSleep(&WALProhibitState->walprohibit_cv,
> > > > + WAIT_EVENT_WALPROHIBIT_STATE_CHANGE);
> > > > + }
> > > > + ConditionVariableCancelSleep();
> > >
> > > What if somebody concurrently changes the state back to READ WRITE?
> > > Won't we unnecessarily wait here?
> > >
> >
> > Yes, there will be wait.
> >
> > > That's probably fine, because we would just wait until that transition
> > > is complete too. But at least a comment about that would be
> > > good. Alternatively a "ASRO transitions completed counter" or such might
> > > be a better idea?
> > >
> >
> > Ok, will add comments but could you please elaborate little a bit about "ASRO
> > transitions completed counter" and is there any existing counter I can refer
> > to?
> >

In an off-list discussion, Robert had explained to me this counter thing and
its requirement.

I tried to add the same as "shared WAL prohibited state generation" in the
attached version. The implementation is quite similar to the generation counter
in the super barrier. In the attached version, when a backend makes a request
for the WAL prohibit state changes then a generation number will be given to
that backend to wait on and that wait will be ended when the shared generation
counter changes.

> > >
[... Skipped ....]
> > > > +/*
> > > > + * SetWALProhibitState()
> > > > + *
> > > > + * Change current WAL prohibit state to the input state.
> > > > + *
> > > > + * If the server is already completely moved to the requested WAL prohibit
> > > > + * state, or if the desired state is same as the current state, return false,
> > > > + * indicating that the server state did not change. Else return true.
> > > > + */
> > > > +bool
> > > > +SetWALProhibitState(uint32 new_state)
> > > > +{
> > > > + bool state_updated = false;
> > > > + uint32 cur_state;
> > > > +
> > > > + cur_state = GetWALProhibitState();
> > > > +
> > > > + /* Server is already in requested state */
> > > > + if (new_state == cur_state ||
> > > > + new_state == (cur_state | WALPROHIBIT_TRANSITION_IN_PROGRESS))
> > > > + return false;
> > > > +
> > > > + /* Prevent concurrent contrary in progress transition state setting */
> > > > + if ((new_state & WALPROHIBIT_TRANSITION_IN_PROGRESS) &&
> > > > + (cur_state & WALPROHIBIT_TRANSITION_IN_PROGRESS))
> > > > + {
> > > > + if (cur_state & WALPROHIBIT_STATE_READ_ONLY)
> > > > + ereport(ERROR,
> > > > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > > > + errmsg("system state transition to read only is already in progress"),
> > > > + errhint("Try after sometime again.")));
> > > > + else
> > > > + ereport(ERROR,
> > > > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > > > + errmsg("system state transition to read write is already in progress"),
> > > > + errhint("Try after sometime again.")));
> > > > + }
> > > > +
> > > > + /* Update new state in share memory */
> > > > + state_updated =
> > > > + pg_atomic_compare_exchange_u32(&WALProhibitState->SharedWALProhibitState,
> > > > + &cur_state, new_state);
> > > > +
> > > > + if (!state_updated)
> > > > + ereport(ERROR,
> > > > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > > > + errmsg("system read write state concurrently changed"),
> > > > + errhint("Try after sometime again.")));
> > > > +
> > >
> > > I don't think it's safe to use pg_atomic_compare_exchange_u32() outside
> > > of a loop. I think there's platforms (basically all load-linked /
> > > store-conditional architectures) where than can fail spuriously.
> > >
> > > Also, there's no memory barrier around GetWALProhibitState, so there's
> > > no guarantee it's not an out-of-date value you're starting with.
> > >
> >
> > How about having some kind of lock instead what Robert have suggested
> > previously[3] ?
> >
>
> I would like to discuss this point more. In the attached version I have added
> WALProhibitLock to protect shared walprohibit state updates. I was a little
> unsure do we want another spinlock what XLogCtlData has which is mostly used to
> read the shared variable and for the update, both are used e.g. LogwrtResult.
>
> Right now I haven't added and shared walprohibit state was fetch using a
> volatile pointer. Do we need a spinlock there, I am not sure why? Thoughts?
>

I reverted this WALProhibitLock implementation since with changes in the
attached version I don't think we need that locking.

Regards,
Amul

Attachment Content-Type Size
v8-0002-Add-alter-system-read-only-write-syntax.patch application/x-patch 9.3 KB
v8-0001-Allow-error-or-refusal-while-absorbing-barriers.patch application/x-patch 4.4 KB
v8-0005-WIP-Documentation.patch application/x-patch 6.6 KB
v8-0004-Error-or-Assert-before-START_CRIT_SECTION-for-WAL.patch application/x-patch 70.1 KB
v8-0003-Implement-ALTER-SYSTEM-READ-ONLY-using-global-bar.patch application/x-patch 42.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2020-09-23 06:07:10 Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path
Previous Message Amit Langote 2020-09-23 05:39:55 Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY