From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Marko Kreen <marko(at)l-t(dot)ee> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Spinlocks, yet again: analysis and proposed patches |
Date: | 2005-09-13 14:10:13 |
Message-ID: | 22767.1126620613@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Marko Kreen <marko(at)l-t(dot)ee> writes:
> On Sun, Sep 11, 2005 at 05:59:49PM -0400, Tom Lane wrote:
>> However, given that we are only expecting
>> the spinlock to be held for a couple dozen instructions, using the
>> kernel futex mechanism is huge overkill --- the in-kernel overhead
>> to manage the futex state is almost certainly several orders of
>> magnitude more than the delay we actually want.
> Why do you think so? AFAIK on uncontented case there will be no
> kernel access, only atomic inc/dec.
In the uncontended case, we never even enter s_lock() and so the entire
mechanism of yielding is irrelevant. The problem that's being exposed
by these test cases is that on multiprocessors, you can see a
significant rate of spinlock contention (order of 100 events/second,
which is still a tiny fraction of the number of TAS calls) and our
existing mechanism for dealing with contention is just not efficient
enough.
> On contented case you'll want task switch anyway, so the futex
> managing should not matter.
No, we DON'T want a task switch. That's the entire point: in a
multiprocessor, it's a good bet that the spinlock is held by a task
running on another processor, and doing a task switch will take orders
of magnitude longer than just spinning until the lock is released.
You should yield only after spinning long enough to make it a strong
probability that the spinlock is held by a process that's lost the
CPU and needs to be rescheduled.
> If you don't want Linux-specific locking in core code, then
> it's another matter...
Well, it's true, we don't particularly want a one-platform solution,
but if it did what we wanted we might hold our noses and use it anyway.
(I think, BTW, that using futexes at the spinlock level is misguided;
what would be interesting would be to see if we could substitute for
both LWLock and spinlock logic with one futex-based module.)
>> I also saw fairly frequent "stuck spinlock" panics when running
>> more queries than there were processors --- this despite increasing
>> NUM_DELAYS to 10000 in s_lock.c. So I don't trust sched_yield
>> anymore. Whatever it's doing in Linux 2.6 isn't what you'd expect.
>> (I speculate that it's set up to only yield the processor to other
>> processes already affiliated to that processor. In any case, it
>> is definitely capable of getting through 10000 yields without
>> running the guy who's holding the spinlock.)
> This is intended behaviour of sched_yield.
> http://lwn.net/Articles/31462/
> http://marc.theaimsgroup.com/?l=linux-kernel&m=112432727428224&w=2
No; that page still says specifically "So a process calling
sched_yield() now must wait until all other runnable processes in the
system have used up their time slices before it will get the processor
again." I can prove that that is NOT what happens, at least not on
a multi-CPU Opteron with current FC4 kernel. However, if the newer
kernels penalize a process calling sched_yield as heavily as this page
claims, then it's not what we want anyway ...
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Stark | 2005-09-13 14:24:17 | Re: Spinlocks, yet again: analysis and proposed patches |
Previous Message | Peter Eisentraut | 2005-09-13 13:54:24 | Re: Hard drive failure leads to corrupt db |