Re: Is Recovery actually paused?

From: Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: dilipbalaut(at)gmail(dot)com, bharath(dot)rupireddyforpostgres(at)gmail(dot)com, sawada(dot)mshk(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, robertmhaas(at)gmail(dot)com, simon(at)2ndquadrant(dot)com
Subject: Re: Is Recovery actually paused?
Date: 2021-02-09 03:23:23
Message-ID: 20210209122323.8efabdc2929f0f2311e2aa50@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 09 Feb 2021 10:58:04 +0900 (JST)
Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:

> At Mon, 8 Feb 2021 17:05:52 +0530, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote in
> > On Mon, Feb 8, 2021 at 2:19 PM Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp> wrote:
> > >
> > > On Mon, 08 Feb 2021 17:32:46 +0900 (JST)
> > > Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > >
> > > > At Mon, 8 Feb 2021 14:12:35 +0900, Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp> wrote in
> > > > > > > > I think the right fix should be that the state should never go from
> > > > > > > > ‘paused’ to ‘pause requested’ so I think pg_wal_replay_pause should take
> > > > > > > > care of that.
> > > > > > >
> > > > > > > It makes sense to take care of this in pg_wal_replay_pause, but I wonder
> > > > > > > it can not handle the case that a user resume and pause again while a sleep.
> > > > > >
> > > > > > Right, we will have to check and set in the loop. But we should not
> > > > > > allow the state to go from paused to pause requested irrespective of
> > > > > > this.
> > > > >
> > > > > I agree with you.
> > > >
> > > > Is there any actual harm if PAUSED returns to REQUESETED, assuming we
> > > > immediately change the state to PAUSE always we see REQUESTED in the
> > > > waiting loop, despite that we allow change the state from PAUSE to
> > > > REQUESTED via NOT_PAUSED between two successive loop condition checks?
> > >
> > > If a user call pg_wal_replay_pause while recovery is paused, users can
> > > observe 'pause requested' during a sleep alghough the time window is short.
> > > It seems a bit odd that pg_wal_replay_pause changes the state like this
> > > because This state meeans that recovery may not be 'paused'.
> >
> > Yeah, this appears wrong that after 'paused' we go back to 'pause
> > requested'. the logical state transition should always be as below
> >
> > NOT PAUSED -> PAUSE REQUESTED or PAUSED (maybe we should always go to
> > request and then paused but there is nothing wrong with going to
> > paused)
> > PAUSE REQUESTED -> NOT PAUSE or PAUSED (either cancel the request or get paused)
> > PAUSED -> NOT PAUSED (from PAUSED we should not go to the
> > PAUSE_REQUESTED without going to NOT PAUSED)
>
> I didn't asked about the internal logical correctness, but asked about
> *actual harm* revealed to users. I don't see any actual harm in the
> "wrong" transition because:

Actually, the incorrect state transition is not so harmful except that
users can observe unnecessary state changes. However, I don't think any
actual harm in prohibit the incorrect state transition. So, I think we
can do it.

> If we are going to introduce that complexity, I'd like to re-propose
> to introduce interlocking between the recovery side and the
> pause-requestor side instead of introducing the intermediate state,
> which is the cause of the complexity.
>
> The attached PoC patch adds:
>
> - A solid checkpoint just before calling rm_redo. It doesn't add a
> info_lck since the check is done in the existing lock section.
>
> - Interlocking between the above and SetRecoveryPause without adding a
> shared variable.
> (This is what I called "synchronous" before.)

I think waiting in pg_wal_replay_pasue is a possible option, but this will
also introduce other complexity to codes such as possibility of waiting for
long or for ever. For example, waiting in SetRecoveryPause as in your POC
patch appears to make recovery stuck in RecoveryRequiresIntParameter.

By the way, attaching other patch to a thread without the original patch
will make commitfest and cfbot APP confused...

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-02-09 03:57:48 Re: pg_replication_origin_drop API potential race condition
Previous Message Amit Kapila 2021-02-09 03:02:42 Re: repeated decoding of prepared transactions