Re: Postgres stucks in deadlock detection

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Postgres stucks in deadlock detection
Date: 2018-04-13 13:43:09
Message-ID: cab7de8a-2d58-65ec-63a9-f301a0e7611e@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Updated patch is attached.

On 05.04.2018 11:03, Konstantin Knizhnik wrote:
> Hi,
> Thank for your feedback.
>
> On 04.04.2018 21:15, Andres Freund wrote:
>> Hi,
>>
>> On 2018-04-04 11:54:14 +0300, Konstantin Knizhnik wrote:
>>> Several times we and our customers are suffered from the problem that
>>> Postgres got stuck in deadlock detection.
>>> One scenario is YCSB workload with Zipf's distribution when many clients are
>>> trying to update the same record and compete for it's lock.
>>> Another scenario is large number of clients performing inserts in the same
>>> table. In this case the source of the problem is relation extension lock.
>>> In both cases number of connection is large enough: several hundreds.
>> Have you ever observed that in the field? This sounds more artificial
>> than real to me.
>
> Yes, it was real customer's problem. And it  is not so difficult to
> reproduce it: you just need to have system with larger number of cores
> (in our case 72 physical cores)
> and spawn 1000 clients which performs normal inserts to the same table:
>
>
> cat test.sql
> begin;
> insert into test select i, md5(i::text) from generate_series(1,1000) AS i;
> end;
>
> pgbench -U postgres test -c 1000 -j 36 -T 3600 -P 1 -f test.sql
>
>
> With deadlock detection timeout equal to 100msec it happens immediately.
> With default 1 second deadlock timeout it will take about 12 minutes
> after which the following messages can be found in the log:
>
> 2018-03-23 20:25:27.287 MSK [41469] STATEMENT: insert into test select
> i, md5(i::text) from generate_series(1,1000) AS i;
> 2018-03-23 20:25:27.287 MSK [41707] LOG: process 41707 acquired
> ExclusiveLock on extension of relation 16387 of database 16384 after
> 61294.863 ms
> 2018-03-23 20:25:27.287 MSK [41707] STATEMENT: insert into test select
> i, md5(i::text) from generate_series(1,1000) AS i;
> 2018-03-23 20:25:27.303 MSK [42251] LOG: process 42251 still waiting
> for ExclusiveLock on extension of relation 16387 of database 16384
> after 1121.274 ms
> 2018-03-23 20:25:27.303 MSK [42251] DETAIL: Process holding the lock:
> 41707. Wait queue: 41577, 41880, 41482, 41649, 42275, 42237, 42117,
> 41430, 42121, 42091, 41507, 42265, 41738, 42018, 41490, 42048, 42269,
> 42108, 41843, 42213, 42134, 42341, 42083, 41763, 41536, 41980, 41793,
> 41878, 42065, 42152, 42193, 42022, 42103, 42354, 41849, 42120, 41520,
> 41593, 42020, 42123, 42012, 42235, 42242, 41982, 41661, 41734, 42385,
> 41756, 41733, 41415, 41874, 42161, 41749, 41431, 42175, 42100, 42222,
> 41435, 41762, 42352, 41840, 41879, 42348, 41445, 42088, 42187, 41992,
> 42327, 42258, 41586, 42034, 41851, 41440, 42192, 41726, 41471, 42185,
> 41607, 41844, 42016, 41868, 41397, 41932, 42343, 41545, 41690, 41549,
> 41684, 42304, 42105, 41551, 41580, 41728, 42172, 41898, 41560, 41411,
> 41657, 41444, 42170, 41481, 42241, 42263, 41884, 42014, 41608, 42289,
> 42191, 42067, 42011, 41959, 41681, 41768, 42194, 42090, 41527, 41655,
> 41638, 41458, 41552, 41446, 41403, 41666, 42021, 41614, 41542, 41588,
> 41937, 42008, 42280, 42071, 41390, 41483...
>
> At this moment TPS drops to zero and it is even not possible to login
> to Postgres.
> Increasing deadlock timeout to one minute still doesn't help:
>
> 2018-04-02 17:16:59 MSK [644699] [6-1]
> db=bp,user=bp,app=[unknown],client=172.29.224.48 LOG: process 644699
> acquired ExclusiveLock on extension of relation 23554 of database
> 16385 after 585248.495 ms
> 2018-04-02 17:16:59 MSK [629624]: [6-1]
> db=bp,user=bp,app=[unknown],client=172.29.224.64 LOG: process 629624
> acquired ExclusiveLock on extension of relation 23554 of database
> 16385 after 585132.924 ms
> 2018-04-02 17:16:59 MSK [646297]: [6-1]
> db=bp,user=bp,app=[unknown],client=172.29.224.57 LOG: process 646297
> acquired ExclusiveLock on extension of relation 23554 of database
> 16385 after 584982.438 ms
> 2018-04-02 17:16:59 MSK [629264]: [6-1]
> db=bp,user=bp,app=[unknown],client=172.29.224.66 LOG: process 629264
> acquired ExclusiveLock on extension of relation 23554 of database
> 16385 after 584959.859 ms
>
>
> And this is the backtrace of backend which blocks the systems:
>
> #0 0x00007fde34a73576 in do_futex_wait.constprop () from /lib64/libpthread.so.0
> #1 0x00007fde34a73668 in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
> #2 0x0000000000686b02 in PGSemaphoreLock (sema=0x7fcdc7748618) at pg_sema.c:310
> #3 0x00000000006ec0d4 in LWLockAcquire (lock=0x7fcdc774ee80, mode=mode(at)entry=LW_EXCLUSIVE) at lwlock.c:1233
> #4 0x00000000006e99c5 in CheckDeadLock () at proc.c:1671
> #5 ProcSleep (locallock=locallock(at)entry=0x118a158, lockMethodTable=lockMethodTable(at)entry=0x9538a0 <default_lockmethod>) at proc.c:1261
> #6 0x00000000006e49ef in WaitOnLock (locallock=locallock(at)entry=0x118a158, owner=owner(at)entry=0x12490f0) at lock.c:1702
> #7 0x00000000006e5c0b in LockAcquireExtended (locktag=locktag(at)entry=0x7ffc5b5f7d10, lockmode=lockmode(at)entry=7, sessionLock=sessionLock(at)entry=0'\000', dontWait=dontWait(at)entry=0'\000',
> reportMemoryError=reportMemoryError(at)entry=1'\001') at lock.c:998
> #8 0x00000000006e6221 in LockAcquire (locktag=locktag(at)entry=0x7ffc5b5f7d10, lockmode=lockmode(at)entry=7, sessionLock=sessionLock(at)entry=0'\000', dontWait=dontWait(at)entry=0'\000') at lock.c:688
> #9 0x00000000006e3d7c in LockRelationForExtension (relation=relation(at)entry=0x7fcdc73ecb68, lockmode=lockmode(at)entry=7) at lmgr.c:340
> #10 0x00000000004b6ce4 in RelationGetBufferForTuple (relation=relation(at)entry=0x7fcdc73ecb68, len=72, otherBuffer=otherBuffer(at)entry=0, options=options(at)entry=0, bistate=bistate(at)entry=0x0,
> vmbuffer=vmbuffer(at)entry=0x7ffc5b5f7dc0, vmbuffer_other=0x0) at hio.c:526
> #11 0x00000000004b04db in heap_insert (relation=relation(at)entry=0x7fcdc73ecb68, tup=tup(at)entry=0x12768f8, cid=<optimized out>, options=options(at)entry=0, bistate=bistate(at)entry=0x0) at heapam.c:2418
> #12 0x00000000005fce33 in ExecInsert (canSetTag=1'\001', estate=0x1263148, onconflict=<optimized out>, arbiterIndexes=0x0, planSlot=0x12638c8, slot=0x12638c8, mtstate=0x1263498)
> at nodeModifyTable.c:513
> #13 ExecModifyTable (pstate=0x1263498) at nodeModifyTable.c:1781
> #14 0x00000000005e0a5a in ExecProcNode (node=0x1263498) at ../../../src/include/executor/executor.h:250
> #15 ExecutePlan (execute_once=<optimized out>, dest=0x1260e18, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_INSERT, use_parallel_mode=<optimized out>,
> planstate=0x1263498, estate=0x1263148) at execMain.c:1722
> #16 standard_ExecutorRun (queryDesc=0x1291628, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:363
> #17 0x00000000006fc972 in ProcessQuery (plan=<optimized out>, sourceText=0x1236598"insert into test select i, md5(i::text) from generate_series(1,1000)
> AS i;", params=0x0, queryEnv=0x0,
> dest=0x1260e18, completionTag=0x7ffc5b5f81d0 "") at pquery.c:161
> #18 0x00000000006fcbb0 in PortalRunMulti (portal=portal(at)entry=0x1183888, isTopLevel=isTopLevel(at)entry=1'\001', setHoldSnapshot=setHoldSnapshot(at)entry=0'\000', dest=dest(at)entry=0x1260e18,
> altdest=altdest(at)entry=0x1260e18, completionTag=completionTag(at)entry=0x7ffc5b5f81d0 "") at pquery.c:1286
> #19 0x00000000006fd651 in PortalRun (portal=portal(at)entry=0x1183888, count=count(at)entry=9223372036854775807, isTopLevel=isTopLevel(at)entry=1'\001', run_once=run_once(at)entry=1'\001',
> dest=dest(at)entry=0x1260e18, altdest=altdest(at)entry=0x1260e18, completionTag=0x7ffc5b5f81d0 "") at pquery.c:799
> #20 0x00000000006f974f in exec_simple_query (query_string=0x1236598"insert into test select i, md5(i::text) from generate_series(1,1000)
> AS i;") at postgres.c:1099
> #21 0x00000000006fb594 in PostgresMain (argc=<optimized out>, argv=argv(at)entry=0x11920e8, dbname=<optimized out>, username=<optimized out>) at postgres.c:4088
> #22 0x0000000000479887 in BackendRun (port=0x11862b0) at postmaster.c:4405
> #23 BackendStartup (port=0x11862b0) at postmaster.c:4077
> #24 ServerLoop () at postmaster.c:1755
> #25 0x0000000000694fec in PostmasterMain (argc=argc(at)entry=3, argv=argv(at)entry=0x1165250) at postmaster.c:1363
> #26 0x000000000047aca9 in main (argc=3, argv=0x1165250) at main.c:228
>
>
>>
>>> So, I see three possible ways to fix this problem:
>>> 1. Yury Sololov's patch with two phase deadlock check
>>> 2. Avoid concurrent deadlock detection
>>> 3. Avoid concurrent deadlock detection + let CheckDeadLock detect all
>>> deadlocks, not only those in which current transaction is involved.
>> 4) Don't do anything about deadlock detection, because this is just a
>> symptom five steps removed from the cause.
>>
>> We've to pay attention to fixing actual problems, rather than purely
>> benchmark ones. Complexity has a significant price.
>
> Sorry, I once again has to repeat that it was real customer's problem
> and the bgbench scenario described above is just result of our
> attempts to reproduce and investigate the problem.
> Moreover - this is not the first we have faced with this problem with
> postgres stalled because of cascade of deadlock detection timeouts
> expiration.
>
> I do not think that deadlock detection is just "a symptom".
> It has quite simple explanation: deadlock detection tries to
> exclusively obtain all partition locks.
> So if under high load deadlock timeout is expired and deadlock
> detection is launched, then it cause avalanche of lock conflicts which
> cause expiration of deadlock timeouts in other backends,
> which also try to perform deadlock detection and compete for this
> locks... And finally system is stalled.
>
> The proposed patch fixes the problem: there are still some periodic
> fall downs of performance but it never drops to zero and it always
> possible to login to the system.
> So I can not agree with you that deadlock detection is "just a symptom
> five steps removed from the cause".
>
> Certainly, it is possible to find some other solution of the problem,
> for example do not use heavy weight locks for relation extension.
> There is huge discussion about it and patch at commitfest with no
> recent activity and obscure perspectives:
>
> https://www.postgresql.org/message-id/flat/CA%2BTgmobuP%2BQoJgaj2HWz5VoDgVA0KP7UGtTREY%2BPeghbPtWZLg%40mail(dot)gmail(dot)com#CA+TgmobuP+QoJgaj2HWz5VoDgVA0KP7UGtTREY+PeghbPtWZLg(at)mail(dot)gmail(dot)com
>
> But once again please notice that the same problem takes place not
> only with relation extension locks.
> Please read description of the problem in the Yury Sokolov's patch:
>
> https://commitfest.postgresql.org/18/1602/
>
> In this case there are only updates and tuple locks but still the
> source of the problem is deadlock detection and simplest workaround is
> just to increase deadlock timeout.
>
> If you want, I can provide you access at our system where all
> necessary infrastructure is already install and you can easily
> reproduce the problem and investigate it yourself.
> But if something looks like a duck, swims like a duck, and quacks like
> a duck, then it probably /is/ a duck;)
> It seems to be obvious the obtaining all partition locks in CheckDeadLock:
>
>     /*
>      * Acquire exclusive lock on the entire shared lock data
> structures. Must
>      * grab LWLocks in partition-number order to avoid LWLock deadlock.
>      *
>      * Note that the deadlock check interrupt had better not be enabled
>      * anywhere that this process itself holds lock partition locks,
> else this
>      * will wait forever.  Also note that LWLockAcquire creates a critical
>      * section, so that this routine cannot be interrupted by cancel/die
>      * interrupts.
>      */
>     for (i = 0; i < NUM_LOCK_PARTITIONS; i++)
>         LWLockAcquire(LockHashPartitionLockByIndex(i), LW_EXCLUSIVE);
>
> can cause huge contention with other backends in which in turn can
> cause expiration of deadlock timeout in them and escalation of the
> problem.
>
> From the other side, the fact that any of this
>
> a) increasing deadlock timeout
> b) avoid concurrent execution of deadlock check
> c) perform first stage of deadlock check under shared lock
>
> have very noticeable effect on the observed behavior and some of them
> allow to completely eliminate the problem,
> makes me think that deadlock detection is not a symptom but the main
> source of the problem.
>
> And the last note. I know that 1000 connections is too large for
> Postgres. But:
> 1. In customer's case the problem is reproduced even with 100 connections.
> 2. At modern SMPs systems with hundreds of cores running hundreds of
> backends is not something exotic/stupid.
>
> And here we are faced with the problem which Postgres has never has
> before:
> 1. Too large snapshot and long time needed for its calculation
> 2. High contention for procarray locks
> 3. Very inefficient memory usage because of local caches and prepared
> statements.
> ...
>
> At such systems we are faced with many new bottlenecks and we really
> need to address them.
> I hope that nobody think that Postgres should normally work only at
> notebooks and quad core desktops?
> --
> Konstantin Knizhnik
> Postgres Professional:http://www.postgrespro.com
> The Russian Postgres Company

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
deadlock-2.patch text/x-patch 2.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2018-04-13 14:01:23 Re: Instability in partition_prune test?
Previous Message Andrey Borodin 2018-04-13 13:36:07 Re: Covering GiST indexes