Re: Proposal of tunable fix for scalability of 8.4

From: "Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Scott Carey <scott(at)richrelevance(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Proposal of tunable fix for scalability of 8.4
Date: 2009-03-18 22:11:28
Message-ID: 49C17190.9050003@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 03/18/09 17:25, Robert Haas wrote:
> On Wed, Mar 18, 2009 at 1:43 PM, Scott Carey <scott(at)richrelevance(dot)com> wrote:
>
>>>> Its worth ruling out given that even if the likelihood is small, the fix is
>>>> easy. However, I don¹t see the throughput drop from peak as more
>>>> concurrency is added that is the hallmark of this problem < usually with a
>>>> lot of context switching and a sudden increase in CPU use per transaction.
>>>>
>>> The problem is that the proposed "fix" bears a strong resemblence to
>>> attempting to improve your gas mileage by removing a few non-critical
>>> parts from your card, like, say, the bumpers, muffler, turn signals,
>>> windshield wipers, and emergency brake.
>>>
>> The fix I was referring to as easy was using a connection pooler -- as a
>> reply to the previous post. Even if its a low likelihood that the connection
>> pooler fixes this case, its worth looking at.
>>
>
> Oh, OK. There seem to be some smart people saying that's a pretty
> high-likelihood fix. I thought you were talking about the proposed
> locking change.
>
>
>>> While it's true that the car
>>> might be drivable in that condition (as long as nothing unexpected
>>> happens), you're going to have a hard time convincing the manufacturer
>>> to offer that as an options package.
>>>
>> The original poster's request is for a config parameter, for experimentation
>> and testing by the brave. My own request was for that version of the lock to
>> prevent possible starvation but improve performance by unlocking all shared
>> at once, then doing all exclusives one at a time next, etc.
>>
>
> That doesn't prevent starvation in general, although it will for some workloads.
>
> Anyway, it seems rather pointless to add a config parameter that isn't
> at all safe, and adds overhead to a critical part of the system for
> people who don't use it. After all, if you find that it helps, what
> are you going to do? Turn it on in production? I just don't see how
> this is any good other than as a thought-experiment.
>

Actually the patch I submitted shows no overhead from what I have seen
and I think it is useful depending on workloads where it can be turned
on even on production.
> At any rate, as I understand it, even after Jignesh eliminated the
> waits, he wasn't able to push his CPU utilization above 48%. Surely
> something's not right there. And he also said that when he added a
> knob to control the behavior, he got a performance improvement even
> when the knob was set to 0, which corresponds to the behavior we have
> already anyway. So I'm very skeptical that there's something wrong
> with either the system or the test. Until that's understood and
> fixed, I don't think that looking at the numbers is worth much.
>
>

I dont think anything is majorly wrong in my system.. Sometimes it is
PostgreSQL locks in play and sometimes it can be OS/system related locks
in play (network, IO, file system, etc). Right now in my patch after I
fix waiting procarray problem other PostgreSQL locks comes into play:
CLogControlLock, WALInsertLock , etc. Right now out of the box we have
no means of tweaking something in production if you do land in that
problem. With the patch there is means of doing knob control to tweak
the bottlenecks of Locks for the main workload for which it is put in
production.

I still haven't seen any downsides with the patch yet other than
highlighting other bottlenecks in the system. (For example I haven't
seen a run where the tpm on my workload decreases as you increase the
number) What I am suggesting is run the patch and see if you find a
workload where you see a downside in performance and the lock statistics
output to see if it is pushing the bottleneck elsewhere more likely
WALInsertLock or CLogControlBlock. If yes then this patch gives you the
right tweaking opportunity to reduce stress on ProcArrayLock for a
workload while still not seriously stressing WALInsertLock or
CLogControlBlock.

Right now.. the standard answer applies.. nope you are running the wrong
workload for PostgreSQL, use a connection pooler or your own application
logic. Or maybe.. you have too many users for PostgreSQL use some
proprietary database.

-Jignesh

>> I alluded to the three main ways of dealing with lock contention elsewhere.
>> Avoiding locks, making finer grained locks, and making locks faster.
>> All are worthy. Some are harder to do than others. Some have been heavily
>> tuned already. Its a case by case basis. And regardless, the unfair lock
>> is a good test tool.
>>
>
> In view of the caveats above, I'll give that a firm maybe.
>
> ...Robert
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Simon Riggs 2009-03-18 23:06:56 Re: Proposal of tunable fix for scalability of 8.4
Previous Message Jignesh K. Shah 2009-03-18 21:57:25 Re: Proposal of tunable fix for scalability of 8.4