Quick Links

Re: Delay of standby shutdown

From:	Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To:	Soumyadeep Chakraborty <soumyadeep2007(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Delay of standby shutdown
Date:	2020-11-12 03:24:48
Message-ID:	87b9d893-101e-664f-270e-6029b1fe7a0c@oss.nttdata.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2020/11/04 9:35, Soumyadeep Chakraborty wrote:
> Hello Fujii,
>
> On Thu, Sep 17, 2020 at 6:49 AM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>
> As far as I understand after debugging, the problem is as follows:

Yes.

>
> 1. After the primary is stopped, and the standby startup process is
> waiting inside:
>
> (void) WaitLatch(&XLogCtl->recoveryWakeupLatch,
> WL_LATCH_SET | WL_TIMEOUT |
> WL_EXIT_ON_PM_DEATH,
> wait_time,
> WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL);
>
> it receives SIGTERM on account of stopping the standby and it leads to
> the WaitLatch call returning with WL_LATCH_SET.
>
> 2. WaitForWALToBecomeAvailable() then will return true after calling
> XLogFileReadAnyTLI() and eventually, XLogReadRecord() will return NULL
> since there is no new WAL to read, which means ReadRecord() will loop
> back and perform another XLogReadRecord().
>
> 3. The additional XLogReadRecord() will lead to a 5s wait inside
> WaitForWALToBecomeAvailable() - a different WaitLatch() call this time:
>
> /*
> * Wait for more WAL to arrive. Time out after 5 seconds
> * to react to a trigger file promptly and to check if the
> * WAL receiver is still active.
> */
> (void) WaitLatch(&XLogCtl->recoveryWakeupLatch,
> WL_LATCH_SET | WL_TIMEOUT |
> WL_EXIT_ON_PM_DEATH,
> 5000L, WAIT_EVENT_RECOVERY_WAL_STREAM);
>
> 4. And then eventually, the code will handle interrupts here inside
> WaitForWALToBecomeAvailable() after the 5s wait:
>
> /*
> * This possibly-long loop needs to handle interrupts of startup
> * process.
> */
> HandleStartupProcInterrupts();
>
>> To avoid this issue, I think that ReadRecord() should call
>> HandleStartupProcInterrupts() whenever looping back to retry.
>
> Alternatively, perhaps we can place it inside
> WaitForWALToBecomeAvailable() (which already has a similar call),
> since it is more semantically aligned to checking for interrupts, rather
> than ReadRecord()? Like this:

Yes. Your approach looks better to me.
Attached is the updated version of the patch implementing that.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment	Content-Type	Size
shutdown_in_standby_v2.patch	text/plain	583 bytes

In response to

Re: Delay of standby shutdown at 2020-11-04 00:35:00 from Soumyadeep Chakraborty

Responses

Re: Delay of standby shutdown at 2020-11-12 17:58:27 from Soumyadeep Chakraborty

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alexander Korotkov	2020-11-12 03:48:17	Re: Strange GiST logic leading to uninitialized memory access in pg_trgm gist code
Previous Message	Amit Kapila	2020-11-12 03:16:55	Re: logical streaming of xacts via test_decoding is broken