Re: Is Recovery actually paused?

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Is Recovery actually paused?
Date: 2021-01-18 05:56:00
Message-ID: CAFiTN-v_q9nSLt+ww8-vaDpiuWr+HKc=EFkWAXHX-PS+z6XGkg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jan 17, 2021 at 1:48 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Sat, Jan 16, 2021 at 8:59 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Wed, Jan 13, 2021 at 9:20 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > >
> > > > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp> wrote:
> > > > >
> > > > > On Thu, 10 Dec 2020 11:25:23 +0530
> > > > > Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > > >
> > > > > > > > However, I wonder users don't expect pg_is_wal_replay_paused to wait.
> > > > > > > > Especially, if max_standby_streaming_delay is -1, this will be blocked forever,
> > > > > > > > although this setting may not be usual. In addition, some users may set
> > > > > > > > recovery_min_apply_delay for a large. If such users call pg_is_wal_replay_paused,
> > > > > > > > it could wait for a long time.
> > > > > > > >
> > > > > > > > At least, I think we need some descriptions on document to explain
> > > > > > > > pg_is_wal_replay_paused could wait while a time.
> > > > > > >
> > > > > > > Ok
> > > > > >
> > > > > > Fixed this, added some comments in .sgml as well as in function header
> > > > >
> > > > > Thank you for fixing this.
> > > > >
> > > > > Also, is it better to fix the description of pg_wal_replay_pause from
> > > > > "Pauses recovery." to "Request to pause recovery." in according with
> > > > > pg_is_wal_replay_paused?
> > > >
> > > > Okay
> > > >
> > > > >
> > > > > > > > Also, how about adding a new boolean argument to pg_is_wal_replay_paused to
> > > > > > > > control whether this waits for recovery to get paused or not? By setting its
> > > > > > > > default value to true or false, users can use the old format for calling this
> > > > > > > > and the backward compatibility can be maintained.
> > > > > > >
> > > > > > > So basically, if the wait_recovery_pause flag is false then we will
> > > > > > > immediately return true if the pause is requested? I agree that it is
> > > > > > > good to have an API to know whether the recovery pause is requested or
> > > > > > > not but I am not sure is it good idea to make this API serve both the
> > > > > > > purpose? Anyone else have any thoughts on this?
> > > > > > >
> > > > >
> > > > > I think the current pg_is_wal_replay_paused() already has another purpose;
> > > > > this waits recovery to actually get paused. If we want to limit this API's
> > > > > purpose only to return the pause state, it seems better to fix this to return
> > > > > the actual state at the cost of lacking the backward compatibility. If we want
> > > > > to know whether pause is requested, we may add a new API like
> > > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait recovery to actually
> > > > > get paused, we may add an option to pg_wal_replay_pause() for this purpose.
> > > > >
> > > > > However, this might be a bikeshedding. If anyone don't care that
> > > > > pg_is_wal_replay_paused() can make user wait for a long time, I don't care either.
> > > >
> > > > I don't think that it will be blocked ever, because
> > > > pg_wal_replay_pause is sending the WakeupRecovery() which means the
> > > > recovery process will not be stuck on waiting for the WAL.
> > > >
> > > > > > > > As another comment, while pg_is_wal_replay_paused is blocking, I can not cancel
> > > > > > > > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the waiting loop.
> > > > >
> > > > > How about this fix? I think users may want to cancel pg_is_wal_replay_paused() during
> > > > > this is blocking.
> > > >
> > > > Yeah, we can do this. I will send the updated patch after putting
> > > > some more thought into these comments. Thanks again for the feedback.
> > > >
> > >
> > > Please find the updated patch.
> >
> > I've looked at the patch. Here are review comments:
> >
> > + /* Recovery pause state */
> > + RecoveryPauseState recoveryPause;
> >
> > Now that the value can have tri-state, how about renaming it to
> > recoveryPauseState?
>
> This makes sense to me.
>
> > ---
> > bool
> > RecoveryIsPaused(void)
> > +{
> > + bool recoveryPause;
> > +
> > + SpinLockAcquire(&XLogCtl->info_lck);
> > + recoveryPause = (XLogCtl->recoveryPause == RECOVERY_PAUSED) ?
> > true : false;
> > + SpinLockRelease(&XLogCtl->info_lck);
> > +
> > + return recoveryPause;
> > +}
> > +
> > +bool
> > +RecoveryPauseRequested(void)
> > {
> > bool recoveryPause;
> >
> > SpinLockAcquire(&XLogCtl->info_lck);
> > - recoveryPause = XLogCtl->recoveryPause;
> > + recoveryPause = (XLogCtl->recoveryPause !=
> > RECOVERY_IN_PROGRESS) ? true : false;
> > SpinLockRelease(&XLogCtl->info_lck);
> >
> > return recoveryPause;
> > }
> >
> > We can write like recoveryPause = (XLogCtl->recoveryPause == RECOVERY_PAUSED);
>
> In RecoveryPauseRequested, we just want to know whether the pause is
> requested or not, even if the pause requested and not yet pause then
> also we want to return true. So how
> recoveryPause = (XLogCtl->recoveryPause == RECOVERY_PAUSED) will work?
>
> > Also, since these functions do the almost same thing, I think we can
> > have a common function to get XLogCtl->recoveryPause, say
> > GetRecoveryPauseState() or GetRecoveryPause(), and both
> > RecoveryIsPaused() and RecoveryPauseRequested() use the returned
> > value. What do you think?
>
> Yeah we can do that.
>
> > ---
> > +static void
> > +CheckAndSetRecoveryPause(void)
> >
> > Maybe we need to declare the prototype of this function like other
> > functions in xlog.c.
>
> Okay
>
> > ---
> > + /*
> > + * If recovery is not in progress anymore then report an error this
> > + * could happen if the standby is promoted while we were waiting for
> > + * recovery to get paused.
> > + */
> > + if (!RecoveryInProgress())
> > + ereport(ERROR,
> > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > + errmsg("recovery is not in progress"),
> > + errhint("Recovery control functions can only be
> > executed during recovery.")));
> >
> > I think we can improve the error message so that we can tell users the
> > standby has been promoted during the wait. For example,
> >
> > errmsg("the standby was promoted during waiting for
> > recovery to be paused")));
> >
> > ---
> > + /* test for recovery pause if user has requested the pause */
> > + if (((volatile XLogCtlData *) XLogCtl)->recoveryPause)
> > + recoveryPausesHere(false);
> > +
> > + now = GetCurrentTimestamp();
> > +
> >
> > Hmm, if the recovery pauses here, the wal receiver isn't launched even
> > when wal_retrieve_retry_interval has passed, which seems not good. I
> > think we want the recovery to be paused but want the wal receiver to
> > continue receiving WAL.
> >
> > And why do we need to set 'now' here?
> >
> > ---
> > /*
> > * Wait until shared recoveryPause flag is cleared.
> > *
> > * endOfRecovery is true if the recovery target is reached and
> > * the paused state starts at the end of recovery because of
> > * recovery_target_action=pause, and false otherwise.
> > *
> > * XXX Could also be done with shared latch, avoiding the pg_usleep loop.
> > * Probably not worth the trouble though. This state shouldn't be one that
> > * anyone cares about server power consumption in.
> > */
> > static void
> > recoveryPausesHere(bool endOfRecovery)
> >
> > We can improve the first sentence in the above function comment to
> > "Wait until shared recoveryPause is set to RECOVERY_IN_PROGRESS" or
> > something.
> >
> > ---
> > - PG_RETURN_BOOL(RecoveryIsPaused());
> > + if (!RecoveryPauseRequested())
> > + PG_RETURN_BOOL(false);
> > +
> > + /* loop until the recovery is actually paused */
> > + while(!RecoveryIsPaused())
> > + {
> > + pg_usleep(10000L); /* wait for 10 msec */
> > +
> > + /* meanwhile if recovery is resume requested then return false */
> > + if (!RecoveryPauseRequested())
> > + PG_RETURN_BOOL(false);
> > +
> > + CHECK_FOR_INTERRUPTS();
> > +
> > + /*
> > + * If recovery is not in progress anymore then report an error this
> > + * could happen if the standby is promoted while we were waiting for
> > + * recovery to get paused.
> > + */
> > + if (!RecoveryInProgress())
> > + ereport(ERROR,
> > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > + errmsg("recovery is not in progress"),
> > + errhint("Recovery control functions can only be
> > executed during recovery.")));
> > + }
> > +
> > + PG_RETURN_BOOL(true);
> >
> > We have the same !RecoveryPauseRequested() check twice, how about the
> > following arrangement?
> >
> > for (;;)
> > {
> > if (!RecoveryPauseRequested())
> > PG_RETURN_BOOL(false);
> >
> > if (RecoveryIsPaused())
> > break;
> >
> > pg_usleep(10000L);
> >
> > CHECK_FOR_INTERRUPTS();
> >
> > if (!RecoveryInProgress())
> > ereport(...);
> > }
> >
> > PG_RETURN_BOOL(true);
> >
> > Regards,
> >
>
> Okay, we can do that. I will make these changes in the next patch.
>

I have fixed the above agreed comments. Please have a look.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v4-0001-pg_is_wal_replay_paused-will-wait-for-recovery-to.patch text/x-patch 10.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hou, Zhijie 2021-01-18 06:02:29 RE: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
Previous Message Masahiko Sawada 2021-01-18 05:54:33 Re: Transactions involving multiple postgres foreign servers, take 2