Re: stuck spin lock with many concurrent users

From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: stuck spin lock with many concurrent users
Date: 2001-06-26 08:53:04
Message-ID: 20010626175304R.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> > Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> writes
> > >>> How can I check it?
> > >>
> > >> The 'stuck' message should at least give you a code location...
> >
> > > FATAL: s_lock(0x2ac2d016) at spin.c:158, stuck spinlock. Aborting.
> >
> > Hmm, that's SpinAcquire, so it's one of the predefined spinlocks
> > (and not, say, a buffer spinlock). You could try adding some
> > debug logging here, although the output would be voluminous.
> > But what would really be useful is a stack trace for the stuck
> > process. Consider changing the s_lock code to abort() when it
> > gets a stuck spinlock --- then you could gdb the coredump.
>
> Nice idea. I will try that.

It appeared that the deadlock checking timer seems to be the source of
the problem. With the default settings, it checks deadlocks every 1
second PER backend. So if there are 1000 backends, every 1 msec
there's a signal and a shared memory locking in average. That would be
too much. If increase the dealock_timeout to , say 100000, the problem
seems gone. Also the performance increased SIGNIFICANTLY. Before that
I got only 1-2 TPS, but now I get ~20 TPS using pgbench -c 1000.

Here is the backtrace:

#0 0x2ab56d21 in __kill () from /lib/libc.so.6
#1 0x2ab56996 in raise (sig=6) at ../sysdeps/posix/raise.c:27
#2 0x2ab580b8 in abort () at ../sysdeps/generic/abort.c:88
#3 0x80ece1a in s_lock_stuck (lock=0x2ac2d016 "\001",
file=0x816e7bc "spin.c", line=158) at s_lock.c:70
#4 0x80ecf3e in s_lock_sleep (spins=20001, timeout=100000000, microsec=5000,
lock=0x2ac2d016 "\001", file=0x816e7bc "spin.c", line=158) at s_lock.c:109
#5 0x80ecfa3 in s_lock (lock=0x2ac2d016 "\001", file=0x816e7bc "spin.c",
line=158) at s_lock.c:136
#6 0x80efb4d in SpinAcquire (lockid=6) at spin.c:158
#7 0x80f2305 in HandleDeadLock (postgres_signal_arg=14) at proc.c:819
#8 <signal handler called>
#9 0x2abeb134 in semop (semid=32786, sops=0x7fffeebc, nsops=1)
at ../sysdeps/unix/sysv/linux/semop.c:34
#10 0x80ee460 in IpcSemaphoreLock (semId=32786, sem=13, interruptOK=1 '\001')
at ipc.c:426
#11 0x80f217f in ProcSleep (lockMethodTable=0x81c1708, lockmode=6,
lock=0x2ce0ab18, holder=0x2ce339b0) at proc.c:666
#12 0x80f14ff in WaitOnLock (lockmethod=1, lockmode=6, lock=0x2ce0ab18,
holder=0x2ce339b0) at lock.c:955
#13 0x80f1298 in LockAcquire (lockmethod=1, locktag=0x7fffeffc, xid=130139,
lockmode=6) at lock.c:739
#14 0x80f0a23 in LockPage (relation=0x2dbeb9d0, blkno=0, lockmode=6)
#15 0x8071ceb in RelationGetBufferForTuple (relation=0x2dbeb9d0, len=132)
at hio.c:97
#16 0x8070293 in heap_update (relation=0x2dbeb9d0, otid=0x7ffff114,
newtup=0x82388c8, ctid=0x7ffff0b0) at heapam.c:1737
#17 0x80b6825 in ExecReplace (slot=0x823af60, tupleid=0x7ffff114,
estate=0x8238a58) at execMain.c:1450
#18 0x80b651e in ExecutePlan (estate=0x8238a58, plan=0x8238d00,
operation=CMD_UPDATE, numberTuples=0, direction=ForwardScanDirection,
destfunc=0x823b680) at execMain.c:1125
#19 0x80b5af3 in ExecutorRun (queryDesc=0x8239080, estate=0x8238a58,
feature=3, count=0) at execMain.c:233
#20 0x80f6d93 in ProcessQuery (parsetree=0x822bc18, plan=0x8238d00,
dest=Remote) at pquery.c:295
#21 0x80f599b in pg_exec_query_string (
query_string=0x822b8c0 "update accounts set abalance = abalance + 277 where aid = 41148\n", dest=Remote, parse_context=0x81fc850) at postgres.c:810
#22 0x80f68c6 in PostgresMain (argc=4, argv=0x7ffff380, real_argc=3,
real_argv=0x7ffffc94, username=0x81cd981 "t-ishii") at postgres.c:1908
#23 0x80e1ee3 in DoBackend (port=0x81cd718) at postmaster.c:2120
#24 0x80e1acc in BackendStartup (port=0x81cd718) at postmaster.c:1903
#25 0x80e0e26 in ServerLoop () at postmaster.c:995
#26 0x80e0853 in PostmasterMain (argc=3, argv=0x7ffffc94) at postmaster.c:685
#27 0x80c4865 in main (argc=3, argv=0x7ffffc94) at main.c:175
#28 0x2ab509cb in __libc_start_main (main=0x80c4750 <main>, argc=3,
argv=0x7ffffc94, init=0x80656c4 <_init>, fini=0x81395ac <_fini>,
rtld_fini=0x2aab5ea0 <_dl_fini>, stack_end=0x7ffffc8c)
at ../sysdeps/generic/libc-start.c:92

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hiroshi Inoue 2001-06-26 09:41:08 Re: stuck spin lock with many concurrent users
Previous Message Bruce Momjian 2001-06-26 04:56:04 Re: Encrypting pg_shadow passwords