Re: pg_get_wal_replay_pause_state() should not return 'paused' while a promotion is ongoing.

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masao(dot)fujii(at)oss(dot)nttdata(dot)com
Cc: dilipbalaut(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_get_wal_replay_pause_state() should not return 'paused' while a promotion is ongoing.
Date: 2021-05-19 07:43:52
Message-ID: 20210519.164352.1786666023204393975.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Wed, 19 May 2021 16:21:58 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
>
>
> On 2021/05/19 15:25, Kyotaro Horiguchi wrote:
> > At Wed, 19 May 2021 11:19:13 +0530, Dilip Kumar
> > <dilipbalaut(at)gmail(dot)com> wrote in
> >> On Wed, May 19, 2021 at 10:16 AM Fujii Masao
> >> <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
> >>>
> >>> On 2021/05/18 15:46, Michael Paquier wrote:
> >>>> On Tue, May 18, 2021 at 12:48:38PM +0900, Fujii Masao wrote:
> >>>>> Currently a promotion causes all available WAL to be replayed before
> >>>>> a standby becomes a primary whether it was in paused state or not.
> >>>>> OTOH, something like immediate promotion (i.e., standby becomes
> >>>>> a primary without replaying outstanding WAL) might be useful for
> >>>>> some cases. I don't object to that.
> >>>>
> >>>> Sounds like a "promotion immediate" mode. It does not sound difficult
> >>>> nor expensive to add a small test for that in one of the existing
> >>>> recovery tests triggerring a promotion. Could you add one based on
> >>>> pg_get_wal_replay_pause_state()?
> >>>
> >>> You're thinking to add the test like the following?
> >>> #1. Pause the recovery
> >>> #2. Confirm that pg_get_wal_replay_pause_state() returns 'paused'
> >>> #3. Trigger standby promotion
> >>> #4. Confirm that pg_get_wal_replay_pause_state() returns 'not paused'
> >>>
> >>> It seems not easy to do the test #4 stably because
> >>> pg_get_wal_replay_pause_state() needs to be executed
> >>> before the promotion finishes.
> >>
> >> Even for #2, we can not ensure that whether it will be 'paused' or
> >> 'pause requested'.
> > We often use poll_query_until() to make sure some desired state is
> > reached.
>
> Yes.
>
> > And, as Michael suggested, the function
> > pg_get_wal_replay_pause_state() still works at the time of
> > recovery_end_command. So a bit more detailed steps are:
>
> IMO this idea is tricky and fragile, so I'm inclined to avoid that if

Agreed, the recovery_end_command would be something like the following
avoiding dependency on sh. However, I'm not sure it works as well on
Windows..

recovery_end_command='perl -e "while( -f \'$trigfile\') {sleep 0.1;}"'

> possible.
> Attached is the POC patch to add the following tests.
>
> #1. Check that pg_get_wal_replay_pause_state() reports "not paused" at
> #first.
> #2. Request to pause archive recovery and wait until it's actually
> #paused.
> #3. Request to resume archive recovery and wait until it's actually
> #resumed.
> #4. Request to pause archive recovery and wait until it's actually
> #paused.
> Then, check that the paused state ends and promotion continues
> if a promotion is triggered while recovery is paused.
>
> In #4, pg_get_wal_replay_pause_state() is not executed while promotion
> is ongoing. #4 checks that pg_is_in_recovery() returns false and
> the promotion finishes expectedly in that case. Isn't this test enough
> for now?

+1 for adding some tests for pg_wal_replay_pause() but the test seems
like checking only that pg_get_wal_replay_pause_state() returns the
expected state value. Don't we need to check that the recovery is
actually paused and that the promotion happens at expected LSN?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-05-19 07:53:18 Re: Bug fix for tab completion of ALTER TABLE ... VALIDATE CONSTRAINT ...
Previous Message Michael Paquier 2021-05-19 07:27:54 Re: pgbench test failing on 14beta1 on Debian/i386