Re: Is Recovery actually paused?

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Is Recovery actually paused?
Date: 2021-01-16 03:28:34
Message-ID: CAD21AoC3SEmG+FjEP1P4UcwKxZ21_S0jDTAKkzWYxz=EAeQsbw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 13, 2021 at 9:20 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp> wrote:
> > >
> > > On Thu, 10 Dec 2020 11:25:23 +0530
> > > Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
> > > > > > However, I wonder users don't expect pg_is_wal_replay_paused to wait.
> > > > > > Especially, if max_standby_streaming_delay is -1, this will be blocked forever,
> > > > > > although this setting may not be usual. In addition, some users may set
> > > > > > recovery_min_apply_delay for a large. If such users call pg_is_wal_replay_paused,
> > > > > > it could wait for a long time.
> > > > > >
> > > > > > At least, I think we need some descriptions on document to explain
> > > > > > pg_is_wal_replay_paused could wait while a time.
> > > > >
> > > > > Ok
> > > >
> > > > Fixed this, added some comments in .sgml as well as in function header
> > >
> > > Thank you for fixing this.
> > >
> > > Also, is it better to fix the description of pg_wal_replay_pause from
> > > "Pauses recovery." to "Request to pause recovery." in according with
> > > pg_is_wal_replay_paused?
> >
> > Okay
> >
> > >
> > > > > > Also, how about adding a new boolean argument to pg_is_wal_replay_paused to
> > > > > > control whether this waits for recovery to get paused or not? By setting its
> > > > > > default value to true or false, users can use the old format for calling this
> > > > > > and the backward compatibility can be maintained.
> > > > >
> > > > > So basically, if the wait_recovery_pause flag is false then we will
> > > > > immediately return true if the pause is requested? I agree that it is
> > > > > good to have an API to know whether the recovery pause is requested or
> > > > > not but I am not sure is it good idea to make this API serve both the
> > > > > purpose? Anyone else have any thoughts on this?
> > > > >
> > >
> > > I think the current pg_is_wal_replay_paused() already has another purpose;
> > > this waits recovery to actually get paused. If we want to limit this API's
> > > purpose only to return the pause state, it seems better to fix this to return
> > > the actual state at the cost of lacking the backward compatibility. If we want
> > > to know whether pause is requested, we may add a new API like
> > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait recovery to actually
> > > get paused, we may add an option to pg_wal_replay_pause() for this purpose.
> > >
> > > However, this might be a bikeshedding. If anyone don't care that
> > > pg_is_wal_replay_paused() can make user wait for a long time, I don't care either.
> >
> > I don't think that it will be blocked ever, because
> > pg_wal_replay_pause is sending the WakeupRecovery() which means the
> > recovery process will not be stuck on waiting for the WAL.
> >
> > > > > > As another comment, while pg_is_wal_replay_paused is blocking, I can not cancel
> > > > > > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the waiting loop.
> > >
> > > How about this fix? I think users may want to cancel pg_is_wal_replay_paused() during
> > > this is blocking.
> >
> > Yeah, we can do this. I will send the updated patch after putting
> > some more thought into these comments. Thanks again for the feedback.
> >
>
> Please find the updated patch.

I've looked at the patch. Here are review comments:

+ /* Recovery pause state */
+ RecoveryPauseState recoveryPause;

Now that the value can have tri-state, how about renaming it to
recoveryPauseState?

---
bool
RecoveryIsPaused(void)
+{
+ bool recoveryPause;
+
+ SpinLockAcquire(&XLogCtl->info_lck);
+ recoveryPause = (XLogCtl->recoveryPause == RECOVERY_PAUSED) ?
true : false;
+ SpinLockRelease(&XLogCtl->info_lck);
+
+ return recoveryPause;
+}
+
+bool
+RecoveryPauseRequested(void)
{
bool recoveryPause;

SpinLockAcquire(&XLogCtl->info_lck);
- recoveryPause = XLogCtl->recoveryPause;
+ recoveryPause = (XLogCtl->recoveryPause !=
RECOVERY_IN_PROGRESS) ? true : false;
SpinLockRelease(&XLogCtl->info_lck);

return recoveryPause;
}

We can write like recoveryPause = (XLogCtl->recoveryPause == RECOVERY_PAUSED);

Also, since these functions do the almost same thing, I think we can
have a common function to get XLogCtl->recoveryPause, say
GetRecoveryPauseState() or GetRecoveryPause(), and both
RecoveryIsPaused() and RecoveryPauseRequested() use the returned
value. What do you think?

---
+static void
+CheckAndSetRecoveryPause(void)

Maybe we need to declare the prototype of this function like other
functions in xlog.c.

---
+ /*
+ * If recovery is not in progress anymore then report an error this
+ * could happen if the standby is promoted while we were waiting for
+ * recovery to get paused.
+ */
+ if (!RecoveryInProgress())
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("recovery is not in progress"),
+ errhint("Recovery control functions can only be
executed during recovery.")));

I think we can improve the error message so that we can tell users the
standby has been promoted during the wait. For example,

errmsg("the standby was promoted during waiting for
recovery to be paused")));

---
+ /* test for recovery pause if user has requested the pause */
+ if (((volatile XLogCtlData *) XLogCtl)->recoveryPause)
+ recoveryPausesHere(false);
+
+ now = GetCurrentTimestamp();
+

Hmm, if the recovery pauses here, the wal receiver isn't launched even
when wal_retrieve_retry_interval has passed, which seems not good. I
think we want the recovery to be paused but want the wal receiver to
continue receiving WAL.

And why do we need to set 'now' here?

---
/*
* Wait until shared recoveryPause flag is cleared.
*
* endOfRecovery is true if the recovery target is reached and
* the paused state starts at the end of recovery because of
* recovery_target_action=pause, and false otherwise.
*
* XXX Could also be done with shared latch, avoiding the pg_usleep loop.
* Probably not worth the trouble though. This state shouldn't be one that
* anyone cares about server power consumption in.
*/
static void
recoveryPausesHere(bool endOfRecovery)

We can improve the first sentence in the above function comment to
"Wait until shared recoveryPause is set to RECOVERY_IN_PROGRESS" or
something.

---
- PG_RETURN_BOOL(RecoveryIsPaused());
+ if (!RecoveryPauseRequested())
+ PG_RETURN_BOOL(false);
+
+ /* loop until the recovery is actually paused */
+ while(!RecoveryIsPaused())
+ {
+ pg_usleep(10000L); /* wait for 10 msec */
+
+ /* meanwhile if recovery is resume requested then return false */
+ if (!RecoveryPauseRequested())
+ PG_RETURN_BOOL(false);
+
+ CHECK_FOR_INTERRUPTS();
+
+ /*
+ * If recovery is not in progress anymore then report an error this
+ * could happen if the standby is promoted while we were waiting for
+ * recovery to get paused.
+ */
+ if (!RecoveryInProgress())
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("recovery is not in progress"),
+ errhint("Recovery control functions can only be
executed during recovery.")));
+ }
+
+ PG_RETURN_BOOL(true);

We have the same !RecoveryPauseRequested() check twice, how about the
following arrangement?

for (;;)
{
if (!RecoveryPauseRequested())
PG_RETURN_BOOL(false);

if (RecoveryIsPaused())
break;

pg_usleep(10000L);

CHECK_FOR_INTERRUPTS();

if (!RecoveryInProgress())
ereport(...);
}

PG_RETURN_BOOL(true);

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-01-16 04:20:36 Re: Key management with tests
Previous Message Bruce Momjian 2021-01-16 01:49:10 Re: Key management with tests