Re: Deadlock between backend and recovery may not be detected

From: Victor Yegorov <vyegorov(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Deadlock between backend and recovery may not be detected
Date: 2020-12-16 13:36:04
Message-ID: CAGnEbohzTqXperry5WaQY_S5AAGJ=+YiZ0BYZqtap2kVx-zvgw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

ср, 16 дек. 2020 г. в 13:49, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>:

> After doing this procedure, you can see the startup process and backend
> wait for the table lock each other, i.e., deadlock. But this deadlock
> remains
> even after deadlock_timeout passes.
>
> This seems a bug to me.
>
> > * Deadlocks involving the Startup process and an ordinary backend process
> > * will be detected by the deadlock detector within the ordinary backend.
>
> The cause of this issue seems that ResolveRecoveryConflictWithLock() that
> the startup process calls when recovery conflict on lock happens doesn't
> take care of deadlock case at all. You can see this fact by reading the
> above
> source code comment for ResolveRecoveryConflictWithLock().
>
> To fix this issue, I think that we should enable STANDBY_DEADLOCK_TIMEOUT
> timer in ResolveRecoveryConflictWithLock() so that the startup process can
> send PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal to the backend.
> Then if PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal arrives,
> the backend should check whether the deadlock actually happens or not.
> Attached is the POC patch implimenting this.
>

I agree that this is a bug.

Unfortunately, we've been hit by it in production.
Such deadlock will, eventually, make all sessions wait on the startup
process, making
streaming replica unusable. In case replica is used for balancing out RO
queries from the primary,
it causes downtime for the project.

If I understand things right, session will release it's locks
when max_standby_streaming_delay is reached.
But it'd be much better if conflict is resolved faster,
around deadlock_timeout.

So — huge +1 from me for fixing it.

--
Victor Yegorov

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Drouvot, Bertrand 2020-12-16 14:28:33 Re: Deadlock between backend and recovery may not be detected
Previous Message Konstantin Knizhnik 2020-12-16 13:35:35 Re: On login trigger: take three