Re: Postgres stucks in deadlock detection

From: Юрий Соколов <funny(dot)falcon(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Postgres stucks in deadlock detection
Date: 2018-04-14 07:09:19
Message-ID: CAL-rCA1CVze9Y8uqJTH2vCffCvggcWQO6UQqaJnV9Q60NnJiyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

пт, 13 апр. 2018 г., 21:10 Andres Freund <andres(at)anarazel(dot)de>:

> Hi,
>
> On 2018-04-13 19:13:07 +0300, Konstantin Knizhnik wrote:
> > On 13.04.2018 18:41, Andres Freund wrote:
> > > On 2018-04-13 16:43:09 +0300, Konstantin Knizhnik wrote:
> > > > Updated patch is attached.
> > > > + /*
> > > > + * Ensure that only one backend is checking for deadlock.
> > > > + * Otherwise under high load cascade of deadlock timeout
> expirations can cause stuck of Postgres.
> > > > + */
> > > > + if (!pg_atomic_test_set_flag(&ProcGlobal->activeDeadlockCheck))
> > > > + {
> > > > + enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
> > > > + return;
> > > > + }
> > > > + inside_deadlock_check = true;
> > > I can't see that ever being accepted. This means there's absolutely no
> > > bound for deadlock checks happening even under light concurrency, even
> > > if there's no contention for a large fraction of the time.
> >
> > It may cause problems only if
> > 1. There is large number of active sessions
> > 2. They perform deadlock-prone queries (so no attempts to avoid
> deadlocks at
> > application level)
> > 3. Deadlock timeout is set to be very small (10 msec?)
>
> That's just not true.
>
>
> > Otherwise either probability that all backends once and once again are
> > trying to check deadlocks concurrently is very small (and can be even
> more
> > reduced by using random timeout for subsequent deadlock checks), either
> > system can not normally function in any case because large number of
> clients
> > fall into deadlock.
>
> Operating systems batch wakeups.
>
>
> > I completely agree that there are plenty of different approaches, but
> IMHO
> > the currently used strategy is the worst one, because it can stall system
> > even if there are not deadlocks at all.
>
>
> > I always think that deadlock is a programmer's error rather than normal
> > situation. May be it is wrong assumption
>
> It is.
>
>
> > So before implementing some complicated solution of the problem9too slow
> > deadlock detection), I think that first it is necessary to understand
> > whether there is such problem at al and under which workload it can
> happen.
>
> Sure. I'm not saying that you shouldn't experiment with a patch like the
> one you sent. What I am saying is that that can't be the actual solution
> that will be integrated.
>

What about my version?
at
https://www.postgresql.org/message-id/flat/bac42052debbd66e8d5f786d8abe8db1(at)postgrespro(dot)ru
It still performs deadlock detection every time, but it tries to detect it
with shared lock first,
and only if there is probability of real deadlock, it rechecks with
exclusive lock.

Although even shared lock leads to some stalleness for active transactions,
but in the catastrophic situation, where many backends to check for
inexisting deadlock at the same time, it greately reduce pause.

Regards,
Yura.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2018-04-14 11:51:40 Re: partitioning code reorganization
Previous Message Michael Paquier 2018-04-14 03:24:37 Re: Proposal: Adding json logging