Re: Postgres stucks in deadlock detection

From: Andres Freund <andres(at)anarazel(dot)de>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Postgres stucks in deadlock detection
Date: 2018-04-13 18:09:48
Message-ID: 20180413180948.rj5e3bsxhilvdccr@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2018-04-13 19:13:07 +0300, Konstantin Knizhnik wrote:
> On 13.04.2018 18:41, Andres Freund wrote:
> > On 2018-04-13 16:43:09 +0300, Konstantin Knizhnik wrote:
> > > Updated patch is attached.
> > > + /*
> > > + * Ensure that only one backend is checking for deadlock.
> > > + * Otherwise under high load cascade of deadlock timeout expirations can cause stuck of Postgres.
> > > + */
> > > + if (!pg_atomic_test_set_flag(&ProcGlobal->activeDeadlockCheck))
> > > + {
> > > + enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
> > > + return;
> > > + }
> > > + inside_deadlock_check = true;
> > I can't see that ever being accepted. This means there's absolutely no
> > bound for deadlock checks happening even under light concurrency, even
> > if there's no contention for a large fraction of the time.
>
> It may cause problems only if
> 1. There is large number of active sessions
> 2. They perform deadlock-prone queries (so no attempts to avoid deadlocks at
> application level)
> 3. Deadlock timeout is set to be very small (10 msec?)

That's just not true.

> Otherwise either probability that all backends  once and once again are
> trying to check deadlocks concurrently is very small (and can be even more
> reduced by using random timeout for subsequent deadlock checks), either
> system can not normally function in any case because large number of clients
> fall into deadlock.

Operating systems batch wakeups.

> I completely agree that there are plenty of different approaches, but IMHO
> the currently used strategy is the worst one, because it can stall system
> even if there are not deadlocks at all.

> I always think that deadlock is a programmer's error rather than normal
> situation. May be it is wrong assumption

It is.

> So before implementing some complicated solution of the problem9too slow
> deadlock detection), I think that first it is necessary to understand
> whether there is such problem at al and under which workload it can happen.

Sure. I'm not saying that you shouldn't experiment with a patch like the
one you sent. What I am saying is that that can't be the actual solution
that will be integrated.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-04-13 18:21:33 Re: crash with sql language partition support function
Previous Message Alvaro Herrera 2018-04-13 18:08:30 Re: crash with sql language partition support function