Re: Postgres stucks in deadlock detection

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Postgres stucks in deadlock detection
Date: 2018-04-05 08:03:58
Message-ID: ef81da49-d491-db86-3ef6-5138d091fe91@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,
Thank for your feedback.

On 04.04.2018 21:15, Andres Freund wrote:
> Hi,
>
> On 2018-04-04 11:54:14 +0300, Konstantin Knizhnik wrote:
>> Several times we and our customers are suffered from the problem that
>> Postgres got stuck in deadlock detection.
>> One scenario is YCSB workload with Zipf's distribution when many clients are
>> trying to update the same record and compete for it's lock.
>> Another scenario is large number of clients performing inserts in the same
>> table. In this case the source of the problem is relation extension lock.
>> In both cases number of connection is large enough: several hundreds.
> Have you ever observed that in the field? This sounds more artificial
> than real to me.

Yes, it was real customer's problem. And it  is not so difficult to
reproduce it: you just need to have system with larger number of cores
(in our case 72 physical cores)
and spawn 1000 clients which performs normal inserts to the same table:

cat test.sql
begin;
insert into test select i, md5(i::text) from generate_series(1,1000) AS i;
end;

pgbench -U postgres test -c 1000 -j 36 -T 3600 -P 1 -f test.sql

With deadlock detection timeout equal to 100msec it happens immediately.
With default 1 second deadlock timeout it will take about 12 minutes
after which the following messages can be found in the log:

2018-03-23 20:25:27.287 MSK [41469] STATEMENT: insert into test select
i, md5(i::text) from generate_series(1,1000) AS i;
2018-03-23 20:25:27.287 MSK [41707] LOG: process 41707 acquired
ExclusiveLock on extension of relation 16387 of database 16384 after
61294.863 ms
2018-03-23 20:25:27.287 MSK [41707] STATEMENT: insert into test select
i, md5(i::text) from generate_series(1,1000) AS i;
2018-03-23 20:25:27.303 MSK [42251] LOG: process 42251 still waiting for
ExclusiveLock on extension of relation 16387 of database 16384 after
1121.274 ms
2018-03-23 20:25:27.303 MSK [42251] DETAIL: Process holding the lock:
41707. Wait queue: 41577, 41880, 41482, 41649, 42275, 42237, 42117,
41430, 42121, 42091, 41507, 42265, 41738, 42018, 41490, 42048, 42269,
42108, 41843, 42213, 42134, 42341, 42083, 41763, 41536, 41980, 41793,
41878, 42065, 42152, 42193, 42022, 42103, 42354, 41849, 42120, 41520,
41593, 42020, 42123, 42012, 42235, 42242, 41982, 41661, 41734, 42385,
41756, 41733, 41415, 41874, 42161, 41749, 41431, 42175, 42100, 42222,
41435, 41762, 42352, 41840, 41879, 42348, 41445, 42088, 42187, 41992,
42327, 42258, 41586, 42034, 41851, 41440, 42192, 41726, 41471, 42185,
41607, 41844, 42016, 41868, 41397, 41932, 42343, 41545, 41690, 41549,
41684, 42304, 42105, 41551, 41580, 41728, 42172, 41898, 41560, 41411,
41657, 41444, 42170, 41481, 42241, 42263, 41884, 42014, 41608, 42289,
42191, 42067, 42011, 41959, 41681, 41768, 42194, 42090, 41527, 41655,
41638, 41458, 41552, 41446, 41403, 41666, 42021, 41614, 41542, 41588,
41937, 42008, 42280, 42071, 41390, 41483...

At this moment TPS drops to zero and it is even not possible to login to
Postgres.
Increasing deadlock timeout to one minute still doesn't help:

2018-04-02 17:16:59 MSK [644699] [6-1]
db=bp,user=bp,app=[unknown],client=172.29.224.48 LOG: process 644699
acquired ExclusiveLock on extension of relation 23554 of database 16385
after 585248.495 ms
2018-04-02 17:16:59 MSK [629624]: [6-1]
db=bp,user=bp,app=[unknown],client=172.29.224.64 LOG: process 629624
acquired ExclusiveLock on extension of relation 23554 of database 16385
after 585132.924 ms
2018-04-02 17:16:59 MSK [646297]: [6-1]
db=bp,user=bp,app=[unknown],client=172.29.224.57 LOG: process 646297
acquired ExclusiveLock on extension of relation 23554 of database 16385
after 584982.438 ms
2018-04-02 17:16:59 MSK [629264]: [6-1]
db=bp,user=bp,app=[unknown],client=172.29.224.66 LOG: process 629264
acquired ExclusiveLock on extension of relation 23554 of database 16385
after 584959.859 ms

And this is the backtrace of backend which blocks the systems:

#0 0x00007fde34a73576 in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1 0x00007fde34a73668 in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2 0x0000000000686b02 in PGSemaphoreLock (sema=0x7fcdc7748618) at pg_sema.c:310
#3 0x00000000006ec0d4 in LWLockAcquire (lock=0x7fcdc774ee80, mode=mode(at)entry=LW_EXCLUSIVE) at lwlock.c:1233
#4 0x00000000006e99c5 in CheckDeadLock () at proc.c:1671
#5 ProcSleep (locallock=locallock(at)entry=0x118a158, lockMethodTable=lockMethodTable(at)entry=0x9538a0 <default_lockmethod>) at proc.c:1261
#6 0x00000000006e49ef in WaitOnLock (locallock=locallock(at)entry=0x118a158, owner=owner(at)entry=0x12490f0) at lock.c:1702
#7 0x00000000006e5c0b in LockAcquireExtended (locktag=locktag(at)entry=0x7ffc5b5f7d10, lockmode=lockmode(at)entry=7, sessionLock=sessionLock(at)entry=0'\000', dontWait=dontWait(at)entry=0'\000',
reportMemoryError=reportMemoryError(at)entry=1'\001') at lock.c:998
#8 0x00000000006e6221 in LockAcquire (locktag=locktag(at)entry=0x7ffc5b5f7d10, lockmode=lockmode(at)entry=7, sessionLock=sessionLock(at)entry=0'\000', dontWait=dontWait(at)entry=0'\000') at lock.c:688
#9 0x00000000006e3d7c in LockRelationForExtension (relation=relation(at)entry=0x7fcdc73ecb68, lockmode=lockmode(at)entry=7) at lmgr.c:340
#10 0x00000000004b6ce4 in RelationGetBufferForTuple (relation=relation(at)entry=0x7fcdc73ecb68, len=72, otherBuffer=otherBuffer(at)entry=0, options=options(at)entry=0, bistate=bistate(at)entry=0x0,
vmbuffer=vmbuffer(at)entry=0x7ffc5b5f7dc0, vmbuffer_other=0x0) at hio.c:526
#11 0x00000000004b04db in heap_insert (relation=relation(at)entry=0x7fcdc73ecb68, tup=tup(at)entry=0x12768f8, cid=<optimized out>, options=options(at)entry=0, bistate=bistate(at)entry=0x0) at heapam.c:2418
#12 0x00000000005fce33 in ExecInsert (canSetTag=1'\001', estate=0x1263148, onconflict=<optimized out>, arbiterIndexes=0x0, planSlot=0x12638c8, slot=0x12638c8, mtstate=0x1263498)
at nodeModifyTable.c:513
#13 ExecModifyTable (pstate=0x1263498) at nodeModifyTable.c:1781
#14 0x00000000005e0a5a in ExecProcNode (node=0x1263498) at ../../../src/include/executor/executor.h:250
#15 ExecutePlan (execute_once=<optimized out>, dest=0x1260e18, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_INSERT, use_parallel_mode=<optimized out>,
planstate=0x1263498, estate=0x1263148) at execMain.c:1722
#16 standard_ExecutorRun (queryDesc=0x1291628, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:363
#17 0x00000000006fc972 in ProcessQuery (plan=<optimized out>, sourceText=0x1236598"insert into test select i, md5(i::text) from generate_series(1,1000) AS i;", params=0x0, queryEnv=0x0,
dest=0x1260e18, completionTag=0x7ffc5b5f81d0 "") at pquery.c:161
#18 0x00000000006fcbb0 in PortalRunMulti (portal=portal(at)entry=0x1183888, isTopLevel=isTopLevel(at)entry=1'\001', setHoldSnapshot=setHoldSnapshot(at)entry=0'\000', dest=dest(at)entry=0x1260e18,
altdest=altdest(at)entry=0x1260e18, completionTag=completionTag(at)entry=0x7ffc5b5f81d0 "") at pquery.c:1286
#19 0x00000000006fd651 in PortalRun (portal=portal(at)entry=0x1183888, count=count(at)entry=9223372036854775807, isTopLevel=isTopLevel(at)entry=1'\001', run_once=run_once(at)entry=1'\001',
dest=dest(at)entry=0x1260e18, altdest=altdest(at)entry=0x1260e18, completionTag=0x7ffc5b5f81d0 "") at pquery.c:799
#20 0x00000000006f974f in exec_simple_query (query_string=0x1236598"insert into test select i, md5(i::text) from generate_series(1,1000) AS i;") at postgres.c:1099
#21 0x00000000006fb594 in PostgresMain (argc=<optimized out>, argv=argv(at)entry=0x11920e8, dbname=<optimized out>, username=<optimized out>) at postgres.c:4088
#22 0x0000000000479887 in BackendRun (port=0x11862b0) at postmaster.c:4405
#23 BackendStartup (port=0x11862b0) at postmaster.c:4077
#24 ServerLoop () at postmaster.c:1755
#25 0x0000000000694fec in PostmasterMain (argc=argc(at)entry=3, argv=argv(at)entry=0x1165250) at postmaster.c:1363
#26 0x000000000047aca9 in main (argc=3, argv=0x1165250) at main.c:228

>
>
>> So, I see three possible ways to fix this problem:
>> 1. Yury Sololov's patch with two phase deadlock check
>> 2. Avoid concurrent deadlock detection
>> 3. Avoid concurrent deadlock detection + let CheckDeadLock detect all
>> deadlocks, not only those in which current transaction is involved.
> 4) Don't do anything about deadlock detection, because this is just a
> symptom five steps removed from the cause.
>
> We've to pay attention to fixing actual problems, rather than purely
> benchmark ones. Complexity has a significant price.

Sorry, I once again has to repeat that it was real customer's problem
and the bgbench scenario described above is just result of our attempts
to reproduce and investigate the problem.
Moreover - this is not the first we have faced with this problem with
postgres stalled because of cascade of deadlock detection timeouts
expiration.

I do not think that deadlock detection is just "a symptom".
It has quite simple explanation: deadlock detection tries to exclusively
obtain all partition locks.
So if under high load deadlock timeout is expired and deadlock detection
is launched, then it cause avalanche of lock conflicts which cause
expiration of deadlock timeouts in other backends,
which also try to perform deadlock detection and compete for this
locks... And finally system is stalled.

The proposed patch fixes the problem: there are still some periodic fall
downs of performance but it never drops to zero and it always possible
to login to the system.
So I can not agree with you that deadlock detection is "just a symptom
five steps removed from the cause".

Certainly, it is possible to find some other solution of the problem,
for example do not use heavy weight locks for relation extension.
There is huge discussion about it and patch at commitfest with no recent
activity and obscure perspectives:

https://www.postgresql.org/message-id/flat/CA%2BTgmobuP%2BQoJgaj2HWz5VoDgVA0KP7UGtTREY%2BPeghbPtWZLg%40mail(dot)gmail(dot)com#CA+TgmobuP+QoJgaj2HWz5VoDgVA0KP7UGtTREY+PeghbPtWZLg(at)mail(dot)gmail(dot)com

But once again please notice that the same problem takes place not only
with relation extension locks.
Please read description of the problem in the Yury Sokolov's patch:

https://commitfest.postgresql.org/18/1602/

In this case there are only updates and tuple locks but still the source
of the problem is deadlock detection and simplest workaround is just to
increase deadlock timeout.

If you want, I can provide you access at our system where all necessary
infrastructure is already install and you can easily reproduce the
problem and investigate it yourself.
But if something looks like a duck, swims like a duck, and quacks like a
duck, then it probably /is/ a duck;)
It seems to be obvious the obtaining all partition locks in CheckDeadLock:

    /*
     * Acquire exclusive lock on the entire shared lock data
structures. Must
     * grab LWLocks in partition-number order to avoid LWLock deadlock.
     *
     * Note that the deadlock check interrupt had better not be enabled
     * anywhere that this process itself holds lock partition locks,
else this
     * will wait forever.  Also note that LWLockAcquire creates a critical
     * section, so that this routine cannot be interrupted by cancel/die
     * interrupts.
     */
    for (i = 0; i < NUM_LOCK_PARTITIONS; i++)
        LWLockAcquire(LockHashPartitionLockByIndex(i), LW_EXCLUSIVE);

can cause huge contention with other backends in which in turn can cause
expiration of deadlock timeout in them and escalation of the problem.

From the other side, the fact that any of this

a) increasing deadlock timeout
b) avoid concurrent execution of deadlock check
c) perform first stage of deadlock check under shared lock

have very noticeable effect on the observed behavior and some of them
allow to completely eliminate the problem,
makes me think that deadlock detection is not a symptom but the main
source of the problem.

And the last note. I know that 1000 connections is too large for
Postgres. But:
1. In customer's case the problem is reproduced even with 100 connections.
2. At modern SMPs systems with hundreds of cores running hundreds of
backends is not something exotic/stupid.

And here we are faced with the problem which Postgres has never has before:
1. Too large snapshot and long time needed for its calculation
2. High contention for procarray locks
3. Very inefficient memory usage because of local caches and prepared
statements.
...

At such systems we are faced with many new bottlenecks and we really
need to address them.
I hope that nobody think that Postgres should normally work only at
notebooks and quad core desktops?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2018-04-05 08:11:50 Re: CALL optional in PL/pgSQL
Previous Message Simon Riggs 2018-04-05 07:55:50 Re: [HACKERS] MERGE SQL Statement for PG11