Re: Delay of standby shutdown

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Soumyadeep Chakraborty <soumyadeep2007(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Delay of standby shutdown
Date: 2020-11-12 03:24:48
Message-ID: 87b9d893-101e-664f-270e-6029b1fe7a0c@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020/11/04 9:35, Soumyadeep Chakraborty wrote:
> Hello Fujii,
>
> On Thu, Sep 17, 2020 at 6:49 AM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>
> As far as I understand after debugging, the problem is as follows:

Yes.

>
> 1. After the primary is stopped, and the standby startup process is
> waiting inside:
>
> (void) WaitLatch(&XLogCtl->recoveryWakeupLatch,
> WL_LATCH_SET | WL_TIMEOUT |
> WL_EXIT_ON_PM_DEATH,
> wait_time,
> WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL);
>
> it receives SIGTERM on account of stopping the standby and it leads to
> the WaitLatch call returning with WL_LATCH_SET.
>
> 2. WaitForWALToBecomeAvailable() then will return true after calling
> XLogFileReadAnyTLI() and eventually, XLogReadRecord() will return NULL
> since there is no new WAL to read, which means ReadRecord() will loop
> back and perform another XLogReadRecord().
>
> 3. The additional XLogReadRecord() will lead to a 5s wait inside
> WaitForWALToBecomeAvailable() - a different WaitLatch() call this time:
>
> /*
> * Wait for more WAL to arrive. Time out after 5 seconds
> * to react to a trigger file promptly and to check if the
> * WAL receiver is still active.
> */
> (void) WaitLatch(&XLogCtl->recoveryWakeupLatch,
> WL_LATCH_SET | WL_TIMEOUT |
> WL_EXIT_ON_PM_DEATH,
> 5000L, WAIT_EVENT_RECOVERY_WAL_STREAM);
>
> 4. And then eventually, the code will handle interrupts here inside
> WaitForWALToBecomeAvailable() after the 5s wait:
>
> /*
> * This possibly-long loop needs to handle interrupts of startup
> * process.
> */
> HandleStartupProcInterrupts();
>
>> To avoid this issue, I think that ReadRecord() should call
>> HandleStartupProcInterrupts() whenever looping back to retry.
>
> Alternatively, perhaps we can place it inside
> WaitForWALToBecomeAvailable() (which already has a similar call),
> since it is more semantically aligned to checking for interrupts, rather
> than ReadRecord()? Like this:

Yes. Your approach looks better to me.
Attached is the updated version of the patch implementing that.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment Content-Type Size
shutdown_in_standby_v2.patch text/plain 583 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2020-11-12 03:48:17 Re: Strange GiST logic leading to uninitialized memory access in pg_trgm gist code
Previous Message Amit Kapila 2020-11-12 03:16:55 Re: logical streaming of xacts via test_decoding is broken