Re: Deadlock between backend and recovery may not be detected

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>, Victor Yegorov <vyegorov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Deadlock between backend and recovery may not be detected
Date: 2020-12-22 14:58:33
Message-ID: b2745637-2333-4b8e-7aaf-5b880ac5224f@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020/12/22 20:42, Fujii Masao wrote:
>
>
> On 2020/12/22 10:25, Masahiko Sawada wrote:
>> On Fri, Dec 18, 2020 at 6:36 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>>>
>>>
>>>
>>> On 2020/12/17 2:15, Fujii Masao wrote:
>>>>
>>>>
>>>> On 2020/12/16 23:28, Drouvot, Bertrand wrote:
>>>>> Hi,
>>>>>
>>>>> On 12/16/20 2:36 PM, Victor Yegorov wrote:
>>>>>>
>>>>>> *CAUTION*: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>>>>>
>>>>>>
>>>>>> ср, 16 дек. 2020 г. в 13:49, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com <mailto:masao(dot)fujii(at)oss(dot)nttdata(dot)com>>:
>>>>>>
>>>>>>      After doing this procedure, you can see the startup process and backend
>>>>>>      wait for the table lock each other, i.e., deadlock. But this deadlock remains
>>>>>>      even after deadlock_timeout passes.
>>>>>>
>>>>>>      This seems a bug to me.
>>>>>>
>>>>> +1
>>>>>
>>>>>>
>>>>>>      > * Deadlocks involving the Startup process and an ordinary backend process
>>>>>>      > * will be detected by the deadlock detector within the ordinary backend.
>>>>>>
>>>>>>      The cause of this issue seems that ResolveRecoveryConflictWithLock() that
>>>>>>      the startup process calls when recovery conflict on lock happens doesn't
>>>>>>      take care of deadlock case at all. You can see this fact by reading the above
>>>>>>      source code comment for ResolveRecoveryConflictWithLock().
>>>>>>
>>>>>>      To fix this issue, I think that we should enable STANDBY_DEADLOCK_TIMEOUT
>>>>>>      timer in ResolveRecoveryConflictWithLock() so that the startup process can
>>>>>>      send PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal to the backend.
>>>>>>      Then if PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal arrives,
>>>>>>      the backend should check whether the deadlock actually happens or not.
>>>>>>      Attached is the POC patch implimenting this.
>>>>>>
>>>>> good catch!
>>>>>
>>>>> I don't see any obvious reasons why the STANDBY_DEADLOCK_TIMEOUT shouldn't be set in ResolveRecoveryConflictWithLock() too (it is already set in ResolveRecoveryConflictWithBufferPin()).
>>>>>
>>>>> So + 1 to consider this as a bug and for the way the patch proposes to fix it.
>>>>
>>>> Thanks Victor and Bertrand for agreeing!
>>>> Attached is the updated version of the patch.
>>>
>>> Attached is v3 of the patch. Could you review this version?
>>>
>>> While the startup process is waiting for recovery conflict on buffer pin,
>>> it repeats sending the request for deadlock check to all the backends
>>> every deadlock_timeout. This may increase the workload in the startup
>>> process and backends, but since this is the original behavior, the patch
>>> doesn't change that. Also in practice this may not be so harmful because
>>> the period that the buffer is kept pinned is basically not so long.
>>>
>>
>> @@ -529,6 +603,26 @@ ResolveRecoveryConflictWithBufferPin(void)
>>       */
>>      ProcWaitForSignal(PG_WAIT_BUFFER_PIN);
>>
>> +   if (got_standby_deadlock_timeout)
>> +   {
>> +       /*
>> +        * Send out a request for hot-standby backends to check themselves for
>> +        * deadlocks.
>> +        *
>> +        * XXX The subsequent ResolveRecoveryConflictWithBufferPin() will wait
>> +        * to be signaled by UnpinBuffer() again and send a request for
>> +        * deadlocks check if deadlock_timeout happens. This causes the
>> +        * request to continue to be sent every deadlock_timeout until the
>> +        * buffer is unpinned or ltime is reached. This would increase the
>> +        * workload in the startup process and backends. In practice it may
>> +        * not be so harmful because the period that the buffer is kept pinned
>> +        * is basically no so long. But we should fix this?
>> +        */
>> +       SendRecoveryConflictWithBufferPin(
>> +
>> PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
>> +       got_standby_deadlock_timeout = false;
>> +   }
>> +
>>
>> Since SendRecoveryConflictWithBufferPin() sends the signal to all
>> backends every backend who is waiting on a lock at ProcSleep() and not
>> holding a buffer pin blocking the startup process will end up doing a
>> deadlock check, which seems expensive. What is worse is that the
>> deadlock will not be detected because the deadlock involving a buffer
>> pin isn't detected by CheckDeadLock(). I thought we can replace
>> PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK with
>> PROCSIG_RECOVERY_CONFLICT_BUFFERPIN but it’s not good because the
>> backend who has a buffer pin blocking the startup process and not
>> waiting on a lock is also canceled after deadlock_timeout. We can have
>> the backend return from RecoveryConflictInterrupt() when it received
>> PROCSIG_RECOVERY_CONFLICT_BUFFERPIN and is not waiting on any lock,
>> but it’s also not good because we cannot cancel the backend after
>> max_standby_streaming_delay that has a buffer pin blocking the startup
>> process. So I wonder if we can have a new signal. When the backend
>> received this signal it returns from RecoveryConflictInterrupt()
>> without deadlock check either if it’s not waiting on any lock or if it
>> doesn’t have a buffer pin blocking the startup process. Otherwise it's
>> cancelled.
>
> Thanks for pointing out that issue! Using new signal is an idea. Another idea
> is to make a backend skip check the deadlock if GetStartupBufferPinWaitBufId()
> returns -1, i.e., the startup process is not waiting for buffer pin. So,
> what I'm thinkins is;
>
> In RecoveryConflictInterrupt(), when a backend receive
> PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
>
> 1. If a backend isn't waiting for a lock, it does nothing .
> 2. If a backend is waiting for a lock and also holding a buffer pin that
>    delays recovery, it may be canceled.
> 3. If a backend is waiting for a lock and the startup process is not waiting
>    for buffer pin (i.e., the startup process is also waiting for a lock),
>    it checks for the deadlocks.
> 4. If a backend is waiting for a lock and isn't holding a buffer pin that
>    delays recovery though the startup process is waiting for buffer pin,
>    it does nothing.

I implemented this. Patch attached.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment Content-Type Size
recovery_conflict_lock_deadlock_v4.patch text/plain 9.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Borisov 2020-12-22 15:03:05 Re: [PATCH] Automatic HASH and LIST partition creation
Previous Message Fabien COELHO 2020-12-22 14:29:56 Re: [PATCH] Automatic HASH and LIST partition creation