From: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, dilipbalaut(at)gmail(dot)com |
Cc: | michael(at)paquier(dot)xyz, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: pg_get_wal_replay_pause_state() should not return 'paused' while a promotion is ongoing. |
Date: | 2021-05-19 07:21:58 |
Message-ID: | 25396c69-9f70-ca3e-5de3-f3a108ae2a26@oss.nttdata.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2021/05/19 15:25, Kyotaro Horiguchi wrote:
> At Wed, 19 May 2021 11:19:13 +0530, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote in
>> On Wed, May 19, 2021 at 10:16 AM Fujii Masao
>> <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>>>
>>> On 2021/05/18 15:46, Michael Paquier wrote:
>>>> On Tue, May 18, 2021 at 12:48:38PM +0900, Fujii Masao wrote:
>>>>> Currently a promotion causes all available WAL to be replayed before
>>>>> a standby becomes a primary whether it was in paused state or not.
>>>>> OTOH, something like immediate promotion (i.e., standby becomes
>>>>> a primary without replaying outstanding WAL) might be useful for
>>>>> some cases. I don't object to that.
>>>>
>>>> Sounds like a "promotion immediate" mode. It does not sound difficult
>>>> nor expensive to add a small test for that in one of the existing
>>>> recovery tests triggerring a promotion. Could you add one based on
>>>> pg_get_wal_replay_pause_state()?
>>>
>>> You're thinking to add the test like the following?
>>> #1. Pause the recovery
>>> #2. Confirm that pg_get_wal_replay_pause_state() returns 'paused'
>>> #3. Trigger standby promotion
>>> #4. Confirm that pg_get_wal_replay_pause_state() returns 'not paused'
>>>
>>> It seems not easy to do the test #4 stably because
>>> pg_get_wal_replay_pause_state() needs to be executed
>>> before the promotion finishes.
>>
>> Even for #2, we can not ensure that whether it will be 'paused' or
>> 'pause requested'.
>
> We often use poll_query_until() to make sure some desired state is
> reached.
Yes.
> And, as Michael suggested, the function
> pg_get_wal_replay_pause_state() still works at the time of
> recovery_end_command. So a bit more detailed steps are:
IMO this idea is tricky and fragile, so I'm inclined to avoid that if possible.
Attached is the POC patch to add the following tests.
#1. Check that pg_get_wal_replay_pause_state() reports "not paused" at first.
#2. Request to pause archive recovery and wait until it's actually paused.
#3. Request to resume archive recovery and wait until it's actually resumed.
#4. Request to pause archive recovery and wait until it's actually paused.
Then, check that the paused state ends and promotion continues
if a promotion is triggered while recovery is paused.
In #4, pg_get_wal_replay_pause_state() is not executed while promotion
is ongoing. #4 checks that pg_is_in_recovery() returns false and
the promotion finishes expectedly in that case. Isn't this test enough for now?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Attachment | Content-Type | Size |
---|---|---|
recovery_pause_test_v1.patch | text/plain | 2.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2021-05-19 07:27:54 | Re: pgbench test failing on 14beta1 on Debian/i386 |
Previous Message | Fabien COELHO | 2021-05-19 07:06:16 | Re: pgbench test failing on 14beta1 on Debian/i386 |