Re: Add Information during standby recovery conflicts

From: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add Information during standby recovery conflicts
Date: 2020-12-04 10:22:17
Message-ID: 73037867-b688-c5cc-5f6d-ad73db6ab94d@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 12/4/20 2:21 AM, Fujii Masao wrote:
>
> On 2020/12/04 9:28, Masahiko Sawada wrote:
>> On Fri, Dec 4, 2020 at 2:54 AM Fujii Masao
>> <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>>>
>>>
>>>
>>> On 2020/12/01 17:29, Drouvot, Bertrand wrote:
>>>> Hi,
>>>>
>>>> On 12/1/20 12:35 AM, Masahiko Sawada wrote:
>>>>> CAUTION: This email originated from outside of the organization.
>>>>> Do not click links or open attachments unless you can confirm the
>>>>> sender and know the content is safe.
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Dec 1, 2020 at 3:25 AM Alvaro Herrera
>>>>> <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>>>>>> On 2020-Dec-01, Fujii Masao wrote:
>>>>>>
>>>>>>> +                     if (proc)
>>>>>>> +                     {
>>>>>>> +                             if (nprocs == 0)
>>>>>>> + appendStringInfo(&buf, "%d", proc->pid);
>>>>>>> +                             else
>>>>>>> + appendStringInfo(&buf, ", %d", proc->pid);
>>>>>>> +
>>>>>>> +                             nprocs++;
>>>>>>>
>>>>>>> What happens if all the backends in wait_list have gone? In
>>>>>>> other words,
>>>>>>> how should we handle the case where nprocs == 0 (i.e., nprocs
>>>>>>> has not been
>>>>>>> incrmented at all)? This would very rarely happen, but can happen.
>>>>>>> In this case, since buf.data is empty, at least there seems no
>>>>>>> need to log
>>>>>>> the list of conflicting processes in detail message.
>>>>>> Yes, I noticed this too; this can be simplified by changing the
>>>>>> condition in the ereport() call to be "nprocs > 0" (rather than
>>>>>> wait_list being null), otherwise not print the errdetail.  (You
>>>>>> could
>>>>>> test buf.data or buf.len instead, but that seems uglier to me.)
>>>>> +1
>>>>>
>>>>> Maybe we can also improve the comment of this function from:
>>>>>
>>>>> + * This function also reports the details about the conflicting
>>>>> + * process ids if *wait_list is not NULL.
>>>>>
>>>>> to " This function also reports the details about the conflicting
>>>>> process ids if exist" or something.
>>>>>
>>>> Thank you all for the review/remarks.
>>>>
>>>> They have been addressed in the new attached patch version.
>>>
>>> Thanks for updating the patch! I read through the patch again
>>> and applied the following chages to it. Attached is the updated
>>> version of the patch. Could you review this version? If there is
>>> no issue in it, I'm thinking to commit this version.
>>
>> Thank you for updating the patch! I have one question.
>>
>>>
>>> +                       timeouts[cnt].id = STANDBY_TIMEOUT;
>>> +                       timeouts[cnt].type = TMPARAM_AFTER;
>>> +                       timeouts[cnt].delay_ms = DeadlockTimeout;
>>>
>>> Maybe STANDBY_TIMEOUT should be STANDBY_DEADLOCK_TIMEOUT here?
>>> I changed the code that way.
>>
>> As the comment of ResolveRecoveryConflictWithLock() says the
>> following, a deadlock is detected by the ordinary backend process:
>>
>>   * Deadlocks involving the Startup process and an ordinary backend
>> proces
>>   * will be detected by the deadlock detector within the ordinary
>> backend.
>>
>> If we use STANDBY_DEADLOCK_TIMEOUT,
>> SendRecoveryConflictWithBufferPin() will be called after
>> DeadlockTimeout passed, but I think it's not necessary for the startup
>> process in this case.
>
> Thanks for pointing this! You are right.
>
>
>> If we want to just wake up the startup process
>> maybe we can use STANDBY_TIMEOUT here?
>
Thanks for the patch updates! Except what we are still discussing below,
it looks good to me.

> When STANDBY_TIMEOUT happens, a request to release conflicting buffer
> pins is sent. Right? If so, we should not also use STANDBY_TIMEOUT there?

Agree

>
> Or, first of all, we don't need to enable the deadlock timer at all?
> Since what we'd like to do is to wake up after deadlock_timeout
> passes, we can do that by changing ProcWaitForSignal() so that it can
> accept the timeout and giving the deadlock_timeout to it. If we do
> this, maybe we can get rid of STANDBY_LOCK_TIMEOUT from
> ResolveRecoveryConflictWithLock(). Thought?

Why not simply use (again) the STANDBY_LOCK_TIMEOUT one? (as it triggers
a call to StandbyLockTimeoutHandler() which does nothing, except waking
up. That's what we want, right?)

I've attached a new version that makes use of it (that's the only change
compare to Masao's updates).

Thanks

Bertrand

Attachment Content-Type Size
v10-0002-Log-the-standby-recovery-conflict-waits.patch text/plain 19.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2020-12-04 11:15:29 Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
Previous Message torikoshia 2020-12-04 10:16:12 Re: Get memory contexts of an arbitrary backend process