Re: Proposal of tunable fix for scalability of 8.4

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Scott Carey <scott(at)richrelevance(dot)com>
Cc: "Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)sun(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Proposal of tunable fix for scalability of 8.4
Date: 2009-03-20 03:45:17
Message-ID: 603c8f070903192045g4817249jebc8051fe9598dd4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Mar 19, 2009 at 5:43 PM, Scott Carey <scott(at)richrelevance(dot)com> wrote:
>> Well, unless I'm misunderstanding something, waking all waiters every
>> time could lead to arbitrarily long delays for writers on mostly
>> read-only workloads... and by arbitrarily along, we mean to say
>> "potentially just about forever".  That doesn't sound safe for
>> production to me.
>
> The other discussion going on indicates that that condition already can
> happen, shared can always currently cut in line while other shared locks
> have the lock, though I don't understand all the details.

No. If the first process waiting for an LWLock wants an exclusive
lock, we wake up that process, and only that process. If the first
process waiting for an LWLock wants a shared lock, we wake up that
process, and the processes which it follow it in the queue that also
want shared locks. But if we come to a process which holds an
exclusive lock, we stop. So if the wait queue looks like this
SSSXSSSXSSS, then the first three processes will be woken up, but the
remainder will not. The new wait queue will look like this: XSSSXSSS
- and the exclusive waiter at the head of the queue is guaranteed to
get the next turn.

If you wake up everybody, then the new queue will look like this: XXX.
Superficially that's a good thing because you let 9 guys run rather
than 3. But suppose that while those 9 guys hold the lock, twenty
more shared locks join the end of the queue, so it looks like this
XXXSSSSSSSSSSSSSSSSSSSS. Now when the last of the 9 guys releases the
lock, we wake up everybody again, and odds are good that since there
are a lot more S guys than X guys, once of the S guys will grab the
lock first. The other S guys will all acquire the lock too, but the X
guys are frozen out. This whole cycle can repeat: by the time those
20 guys are done with their S locks, there can be 20 more guys waiting
for S locks, and once again when we wake everyone up one of the new S
guys will probably grab it again. This can continue for an
indefinitely long period of time.

Now, of course, EVENTUALLY one of the X guys will probably beat out
all the S-lock waiters and he'll get to do his thing. But there's no
upper bound on how long this can take, and if the rate at which S-lock
waiters are joining the queue is much higher than the rate at which
X-lock waiters are joining the queue, it may be quite a long time.
Even if the overall system throughput is better with this change, the
fact that the guys who need the X-lock get seriously shafted is a
really serious problem. If I start a million transactions on my
system and they all complete in average of 1 second each, that sounds
pretty good - unless it's because 999,999 of them completed almost
instantaneously and the last one took a million seconds.

Now, I'm not familiar enough with the use of ProcArrayLock to suggest
a workload that will produce this pathological behavior in PG. But,
I'm pretty confident based on what I know about locking in general
that they exist.

> Also, the tests on the 'wake all' version clearly aren't starving anything
> in a load test with thousands of threads and very heavy lock contention,
> mostly for shared locks.
> Instead throughput increases and all wait times decrease.

On the average, yes...

> There are several other proposals to make starvation less possible (wake
> only shared and other proposals that alternate between shared and exclusive;
> waking only X sized chunks, etc -- its all just investigation into fixing
> what can be improved on -- solutions that are easily testable should not
> just be thrown out: the first ones were just the easiest to try).

Alternating between shared and exclusive is safe. But a lot more
testing in a lot more situations would be needed to determine whether
it is better, I think. Waking chunks of a certain size I believe will
produce a more complicated version of the problem described above.

...Robert

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Carey 2009-03-20 06:01:05 Re: Proposal of tunable fix for scalability of 8.4
Previous Message Jignesh K. Shah 2009-03-19 23:28:19 Re: Postgres benchmarking with pgbench