Re: Deadlock between backend and recovery may not be detected

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>, Victor Yegorov <vyegorov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Deadlock between backend and recovery may not be detected
Date: 2020-12-23 10:28:54
Message-ID: CAD21AoCXR107rCsZ987oFGnkfn1KJYLz__68sGhDP==3d2mU3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 22, 2020 at 11:58 PM Fujii Masao
<masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>
>
>
> On 2020/12/22 20:42, Fujii Masao wrote:
> >
> >
> > On 2020/12/22 10:25, Masahiko Sawada wrote:
> >> On Fri, Dec 18, 2020 at 6:36 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
> >>>
> >>>
> >>>
> >>> On 2020/12/17 2:15, Fujii Masao wrote:
> >>>>
> >>>>
> >>>> On 2020/12/16 23:28, Drouvot, Bertrand wrote:
> >>>>> Hi,
> >>>>>
> >>>>> On 12/16/20 2:36 PM, Victor Yegorov wrote:
> >>>>>>
> >>>>>> *CAUTION*: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> >>>>>>
> >>>>>>
> >>>>>> ср, 16 дек. 2020 г. в 13:49, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com <mailto:masao(dot)fujii(at)oss(dot)nttdata(dot)com>>:
> >>>>>>
> >>>>>> After doing this procedure, you can see the startup process and backend
> >>>>>> wait for the table lock each other, i.e., deadlock. But this deadlock remains
> >>>>>> even after deadlock_timeout passes.
> >>>>>>
> >>>>>> This seems a bug to me.
> >>>>>>
> >>>>> +1
> >>>>>
> >>>>>>
> >>>>>> > * Deadlocks involving the Startup process and an ordinary backend process
> >>>>>> > * will be detected by the deadlock detector within the ordinary backend.
> >>>>>>
> >>>>>> The cause of this issue seems that ResolveRecoveryConflictWithLock() that
> >>>>>> the startup process calls when recovery conflict on lock happens doesn't
> >>>>>> take care of deadlock case at all. You can see this fact by reading the above
> >>>>>> source code comment for ResolveRecoveryConflictWithLock().
> >>>>>>
> >>>>>> To fix this issue, I think that we should enable STANDBY_DEADLOCK_TIMEOUT
> >>>>>> timer in ResolveRecoveryConflictWithLock() so that the startup process can
> >>>>>> send PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal to the backend.
> >>>>>> Then if PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal arrives,
> >>>>>> the backend should check whether the deadlock actually happens or not.
> >>>>>> Attached is the POC patch implimenting this.
> >>>>>>
> >>>>> good catch!
> >>>>>
> >>>>> I don't see any obvious reasons why the STANDBY_DEADLOCK_TIMEOUT shouldn't be set in ResolveRecoveryConflictWithLock() too (it is already set in ResolveRecoveryConflictWithBufferPin()).
> >>>>>
> >>>>> So + 1 to consider this as a bug and for the way the patch proposes to fix it.
> >>>>
> >>>> Thanks Victor and Bertrand for agreeing!
> >>>> Attached is the updated version of the patch.
> >>>
> >>> Attached is v3 of the patch. Could you review this version?
> >>>
> >>> While the startup process is waiting for recovery conflict on buffer pin,
> >>> it repeats sending the request for deadlock check to all the backends
> >>> every deadlock_timeout. This may increase the workload in the startup
> >>> process and backends, but since this is the original behavior, the patch
> >>> doesn't change that. Also in practice this may not be so harmful because
> >>> the period that the buffer is kept pinned is basically not so long.
> >>>
> >>
> >> @@ -529,6 +603,26 @@ ResolveRecoveryConflictWithBufferPin(void)
> >> */
> >> ProcWaitForSignal(PG_WAIT_BUFFER_PIN);
> >>
> >> + if (got_standby_deadlock_timeout)
> >> + {
> >> + /*
> >> + * Send out a request for hot-standby backends to check themselves for
> >> + * deadlocks.
> >> + *
> >> + * XXX The subsequent ResolveRecoveryConflictWithBufferPin() will wait
> >> + * to be signaled by UnpinBuffer() again and send a request for
> >> + * deadlocks check if deadlock_timeout happens. This causes the
> >> + * request to continue to be sent every deadlock_timeout until the
> >> + * buffer is unpinned or ltime is reached. This would increase the
> >> + * workload in the startup process and backends. In practice it may
> >> + * not be so harmful because the period that the buffer is kept pinned
> >> + * is basically no so long. But we should fix this?
> >> + */
> >> + SendRecoveryConflictWithBufferPin(
> >> +
> >> PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
> >> + got_standby_deadlock_timeout = false;
> >> + }
> >> +
> >>
> >> Since SendRecoveryConflictWithBufferPin() sends the signal to all
> >> backends every backend who is waiting on a lock at ProcSleep() and not
> >> holding a buffer pin blocking the startup process will end up doing a
> >> deadlock check, which seems expensive. What is worse is that the
> >> deadlock will not be detected because the deadlock involving a buffer
> >> pin isn't detected by CheckDeadLock(). I thought we can replace
> >> PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK with
> >> PROCSIG_RECOVERY_CONFLICT_BUFFERPIN but it’s not good because the
> >> backend who has a buffer pin blocking the startup process and not
> >> waiting on a lock is also canceled after deadlock_timeout. We can have
> >> the backend return from RecoveryConflictInterrupt() when it received
> >> PROCSIG_RECOVERY_CONFLICT_BUFFERPIN and is not waiting on any lock,
> >> but it’s also not good because we cannot cancel the backend after
> >> max_standby_streaming_delay that has a buffer pin blocking the startup
> >> process. So I wonder if we can have a new signal. When the backend
> >> received this signal it returns from RecoveryConflictInterrupt()
> >> without deadlock check either if it’s not waiting on any lock or if it
> >> doesn’t have a buffer pin blocking the startup process. Otherwise it's
> >> cancelled.
> >
> > Thanks for pointing out that issue! Using new signal is an idea. Another idea
> > is to make a backend skip check the deadlock if GetStartupBufferPinWaitBufId()
> > returns -1, i.e., the startup process is not waiting for buffer pin. So,
> > what I'm thinkins is;
> >
> > In RecoveryConflictInterrupt(), when a backend receive
> > PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
> >
> > 1. If a backend isn't waiting for a lock, it does nothing .
> > 2. If a backend is waiting for a lock and also holding a buffer pin that
> > delays recovery, it may be canceled.
> > 3. If a backend is waiting for a lock and the startup process is not waiting
> > for buffer pin (i.e., the startup process is also waiting for a lock),
> > it checks for the deadlocks.
> > 4. If a backend is waiting for a lock and isn't holding a buffer pin that
> > delays recovery though the startup process is waiting for buffer pin,
> > it does nothing.
>

Good idea! It could still happen that if the startup process sets
startupBufferPinWaitBufId to -1 after sending the signal and before
the backend checks it, the backend will end up doing an unmeaningful
deadlock check. But the likelihood would be low in practice.

I have two small comments on ResolveRecoveryConflictWithBufferPin() in
the v4 patch:

The code currently has three branches as follow:

if (ltime == 0)
{
enable a timeout for deadlock;
}
else if (GetCurrentTimestamp() >= ltime)
{
send recovery conflict signal;
}
else
{
enable two timeouts: ltime and deadlock
}

I think we can rearrange the code similar to the changes you made on
ResolveRecoveryConflictWithLock():

if (GetCurrentTimestamp() >= ltime && ltime != 0)
{
Resolve recovery conflict;
}
else
{
Enable one or two timeouts: ltime and deadlock
}

It's more consistent with ResolveRecoveryConflictWithLock(). And
currently the patch doesn't reset got_standby_deadlock_timeout in
(ltime == 0) case but it will also be resolved by this rearrangement.

---
If we always reset got_standby_deadlock_timeout before waiting it's
not necessary but we might want to clear got_standby_deadlock_timeout
also after disabling all timeouts to ensure that it's cleared at the
end of the function. In ResolveRecoveryConflictWithLock() we clear
both got_standby_lock_timeout and got_standby_deadlock_timeout after
disabling all timeouts but we don't do that in
ResolveRecoveryConflictWithBufferPin().

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-12-23 10:59:03 Re: Logical decoding without slots: decoding in lockstep with recovery
Previous Message Amit Kapila 2020-12-23 10:09:00 Re: Single transaction in the tablesync worker?