Re: Is Recovery actually paused?

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Is Recovery actually paused?
Date: 2021-02-04 04:58:02
Message-ID: CAFiTN-tTAa66hEFUAF=6_H9SFe9NJe34Yp95-vFnS4tk4F=s3A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 1, 2021 at 1:41 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Mon, Feb 1, 2021 at 11:59 AM Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> >
> > At Sun, 31 Jan 2021 11:24:30 +0530, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote in
> > > On Fri, Jan 29, 2021 at 4:33 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > >
> > > > On Fri, Jan 29, 2021 at 3:25 PM Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp> wrote:
> > > > >
> > > > > On Thu, 28 Jan 2021 09:55:42 +0530
> > > > > Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > > >
> > > > > > On Wed, Jan 27, 2021 at 2:28 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > > > > >
> > > > > > > On Wed, Jan 27, 2021 at 2:06 PM Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp> wrote:
> > > > > > > >
> > > > > > > > On Wed, 27 Jan 2021 13:29:23 +0530
> > > > > > > > Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > > > > > >
> > > > > > > > > On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy
> > > > > > > > > > > <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > > > > > > > > > > > +1 to just show the recovery pause state in the output of
> > > > > > > > > > > > pg_is_wal_replay_paused. But, should the function name
> > > > > > > > > > > > "pg_is_wal_replay_paused" be something like
> > > > > > > > > > > > "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists
> > > > > > > > > > > > in a function, I expect a boolean output. Others may have better
> > > > > > > > > > > > thoughts.
> > > > > > > > > > >
> > > > > > > > > > > Maybe we should leave the existing function pg_is_wal_replay_paused()
> > > > > > > > > > > alone and add a new one with the name you suggest that returns text.
> > > > > > > > > > > That would create less burden for tool authors.
> > > > > > > > > >
> > > > > > > > > > +1
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yeah, we can do that, I will send an updated patch soon.
> > > > > > > >
> > > > > > > > This means pg_is_wal_replay_paused is left without any change and this
> > > > > > > > returns whether pause is requested or not? If so, it seems good to modify
> > > > > > > > the documentation of this function in order to note that this could not
> > > > > > > > return the actual pause state.
> > > > > > >
> > > > > > > Yes, we can say that it will return true if the replay pause is
> > > > > > > requested. I am changing that in my new patch.
> > > > > >
> > > > > > I have modified the patch, changes
> > > > > >
> > > > > > - I have added a new interface pg_get_wal_replay_pause_state to get
> > > > > > the pause request state
> > > > > > - Now, we are not waiting for the recovery to actually get paused so I
> > > > > > think it doesn't make sense to put a lot of checkpoints to check the
> > > > > > pause requested so I have removed that check from the
> > > > > > recoveryApplyDelay but I think it better we still keep that check in
> > > > > > the WaitForWalToBecomeAvailable because it can wait forever before the
> > > > > > next wal get available.
> > > > >
> > > > > I think basically the check in WaitForWalToBecomeAvailable is independent
> > > > > of the feature of pg_get_wal_replay_pause_state, that is, reporting the
> > > > > actual pause state. This function could just return 'pause requested'
> > > > > if a pause is requested during waiting for WAL.
> > > > >
> > > > > However, I agree the change to allow recovery to transit the state to
> > > > > 'paused' during WAL waiting because 'paused' has more useful information
> > > > > for users than 'pause requested'. Returning 'paused' lets users know
> > > > > clearly that no more WAL are applied until recovery is resumed. On the
> > > > > other hand, when 'pause requested' is returned, user can't say whether
> > > > > the next WAL wiill be applied or not from this information.
> > > > >
> > > > > For the same reason, I think it is also useful to call recoveryPausesHere
> > > > > in recoveryApplyDelay.
> > > >
> > > > IMHO the WaitForWalToBecomeAvailable can wait until the next wal get
> > > > available so it can not be controlled by user so it is good to put a
> > > > check for the recovery pause, however recoveryApplyDelay wait for the
> > > > apply delay which is configured by user and it is predictable value by
> > > > the user. I don't have much objection to putting that check in the
> > > > recoveryApplyDelay as well but I feel it is not necessary. Any other
> > > > thoughts on this?
> > > >
> > > > > In addition, in RecoveryRequiresIntParameter, recovery should get paused
> > > > > if a parameter value has a problem. However, pg_get_wal_replay_pause_state
> > > > > will return 'pause requested' in this case. So, I think, we should pass
> > > > > RECOVERY_PAUSED to SetRecoveryPause() instead of RECOVERY_PAUSE_REQUESTED,
> > > > > or call CheckAndSetRecoveryPause() in the loop like recoveryPausesHere().
> > > >
> > > > Yeah, absolutely right, it must pass RECOVERY_PAUSED. I will change
> > > > this, thanks for noticing this.
> > >
> > > I have changed this in the new patch.
> >
> > It seems to work well. The checkpoints seems to be placed properly.
>
> Okay
>
> > +SetRecoveryPause(RecoveryPauseState state)
> > {
> > + Assert(state >= RECOVERY_NOT_PAUSED && state <= RECOVERY_PAUSED);
> >
> > I'm not sure that state worth FATAL. Isn't it enough to just ERROR
> > out like XLogFileRead?
>
> Yeah, that makes sense to me.
>
> > CheckAndSetRecovery() has only one caller. I think it's better to
> > write the code directly.
>
> Okay, I will change.
>
> > I think the documentation of pg_wal_replay_pause needs to be a bit
> > more detailed about the difference between the two states "pause
> > requested" and "paused". Something like "A request doesn't mean that
> > recovery stops right away. If you want a guarantee that recovery is
> > actually paused, you need to check for the recovery pause state
> > returned by pg_wal_replay_pause_state(). Note that
> > pg_is_wal_repay_paused() returns whether a request is made."
>
> That seems like better idea, I will change.
>

Please find an updated patch which addresses these comments.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v10-0001-Provide-a-new-interface-to-get-the-recovery-paus.patch text/x-patch 12.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-02-04 05:43:39 Re: pg_replication_origin_drop API potential race condition
Previous Message Tom Lane 2021-02-04 04:41:27 Re: a curious case of force_parallel_mode = on with jit'ing