Re: stuck spin lock with many concurrent users

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: Inoue(at)tpf(dot)co(dot)jp, pgsql-hackers(at)postgresql(dot)org
Subject: Re: stuck spin lock with many concurrent users
Date: 2001-07-03 19:01:05
Message-ID: 25284.994186865@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> writes:
> I added some codes into HandleDeadLock to measure how long
> LockLockTable and DeadLOckCheck calls take. Followings are the result
> in running pgbench -c 1000 (it failed with stuck spin lock
> error). "real time" shows how long they actually run (using
> gettimeofday). "user time" and "system time" are measured by calling
> getrusage. The time unit is milli second.

> LockLockTable: real time

> min | max | avg
> -----+--------+-------------------
> 0 | 867873 | 152874.9015151515

> LockLockTable: user time

> min | max | avg
> -----+-----+--------------
> 0 | 30 | 1.2121212121

> LockLockTable: system time

> min | max | avg
> -----+------+----------------
> 0 | 2140 | 366.5909090909

> DeadLockCheck: real time

> min | max | avg
> -----+-------+-----------------
> 0 | 87671 | 3463.6996197719

> DeadLockCheck: user time

> min | max | avg
> -----+-----+---------------
> 0 | 330 | 14.2205323194

> DeadLockCheck: system time

> min | max | avg
> -----+-----+--------------
> 0 | 100 | 2.5095057034

Hm. It doesn't seem that DeadLockCheck is taking very much of the time.
I have to suppose that the problem is (once again) our inefficient
spinlock code.

If you think about it, on a typical platform where processes waiting for
a time delay are released at a clock tick, what's going to be happening
is that a whole lot of spinblocked processes will all be awoken in the
same clock tick interrupt. The first one of these that gets to run will
acquire the spinlock, if it's free, and the rest will go back to sleep
and try again at the next tick. This could be highly unfair depending
on just how the kernel's scheduler works --- for example, one could
easily believe that the waiters might be awoken in process-number order,
in which case backends with high process numbers might never get to
acquire the spinlock, or at least would have such low probability of
winning that they are prone to "stuck spinlock" timeout.

We really need to look at replacing the spinlock mechanism with
something more efficient.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Myers 2001-07-03 19:36:19 Re: Buffer access rules, and a probable bug
Previous Message Tom Lane 2001-07-03 18:38:08 Re: selecting from cursor