Quick Links

Re: Update on the spinlock->pthread_mutex patch experimental: replace s_lock spinlock code with pthread_mutex on linux

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	Nils Goroll <slink(at)schokola(dot)de>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Update on the spinlock->pthread_mutex patch experimental: replace s_lock spinlock code with pthread_mutex on linux
Date:	2012-07-01 20:25:27
Message-ID:	CAMkU=1w5VcGtv=xV-9hnhFn0MsH3hHcasR4bnWCxz-JSonFwKw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Jun 29, 2012 at 10:07 AM, Nils Goroll <slink(at)schokola(dot)de> wrote:

> On 06/28/12 05:21 PM, Jeff Janes wrote:
>
>> It looks like the hacked code is slower than the original. That
>> doesn't seem so good to me. Am I misreading this?
>
> No, you are right - in a way. This is not about maximizing tps, this is about
> maximizing efficiency under load situations

But why wouldn't this maximized efficiency present itself as increased TPS?

>
>> Also, 20 transactions per connection is not enough of a run to make
>> any evaluation on.
>
> As you can see I've repeated the tests 10 times. I've tested slight variations
> as mentioned above, so I was looking for quick results with acceptable variation.

Testing it 10 times doesn't necessarily improve things. You have ten
times as many transactions, but you also have 10 times as much
start-up and tear-down events polluting the numbers. (Unless this
start up and tear down are exactly the things you are trying to
measure). And once you change your benchmark so that it is testing
what you want to be testing, it will probably get even worse. You
should use at least -T30, rather than -t20.

Anyway, your current benchmark speed of around 600 TPS over such a
short time periods suggests you are limited by fsyncs. It is going to
be pretty hard to get a spinlock bottleneck in simple queries like
pgbench does as long as that is the case. You could turn --fsync=off,
or just change your benchmark to a read-only one like -S, or better
the -P option I've been trying get into pgbench.

Does your production server have fast fsyncs (BBU) while your test
server does not?

>
>>> Regarding the actual production issue, I did not manage to synthetically provoke
>>> the saturation we are seeing in production using pgbench - I could not even get
>>> anywhere near the production load.
>>
>> What metrics/tools are you using to compare the two loads?
>
> We've got cpu + load avg statistics for the old+new machine and compared values
> before/after the migration. The user load presumably is comparable and the main
> metric is "users complaining" vs. "users happy".

The users probably don't care about the load average. Presumably they
are unhappy because of lowered throughput (TPS) or higher peak latency
(-l switch in pgbench). So I think the only use of load average is to
verify that your benchmark is nothing like your production workload.
(But it doesn't go the other way around, just because the load
averages are similar doesn't mean the actual workloads are.)

> I wish we had a synthetic benchmark close to the actual load, and I hope that
> one of the insights from this will be that the customer should have one.

If they could simulate a workload close to what they actually do, that
would be great. But surely just with fairly simple pgbench
configuration you can get much closer to it than what you are
currently.

>> What is the production load like?
>
> Here's an anonymized excerpt from a pgFouine analysis of 137 seconds worth of
> query logs at "average production user load".
>
> Type Count Percentage
> SELECT 80,217 27.1
> INSERT 6,248 2.1
> UPDATE 37,159 12.6
> DELETE 4,579 1.5

Without knowing how complicated the joins involved in the various
statements are, I don't think I can get much info out of this. but
I'm not familiar with pgFouine, maybe there is another way to
summarize its output that is more informative.

>
> Queries that took up the most time (N) ^
>
>
> Rank Total duration Times executed Av. duration s Query
> 1 3m39s 83,667 0.00 COMMIT;

So fsync's probably are not totally free on production, but I still
think they must be much cheaper than on your test box.

> 2 54.4s 2 27.18 SELECT ...

That is interesting. Maybe those two queries are hammering everything
else to death.

> 3 41.1s 281 0.15 UPDATE ...
> 4 25.4s 18,960 0.00 UPDATE ...
> 5 21.9s ...
>
> the 9th rank is already below 10 seconds Total duration

But how does the 9th rank through the final rank, cumulatively, stack up?

In other words, how many query-seconds worth of time transpired during
the 137 wall seconds? That would give an estimate of how many
simultaneously active connections the production server has.

>> Each transaction has to update one of ten pgbench_branch rows, so you
>> can't have more than ten transactions productively active at any given
>> time, even though you have 768 connections. So you need to jack up
>> the pgbench scale, or switch to using -N mode.
>
> Sorry for having omitted that detail. I had initialized pgbench with -i -s 100

Are you sure? In an earlier email you reported the entire output of
pgbench, and is said it was using 10. Maybe you've changed it since
then...

Cheers,

Jeff

In response to

Update on the spinlock->pthread_mutex patch experimental: replace s_lock spinlock code with pthread_mutex on linux at 2012-06-29 17:07:11 from Nils Goroll

Responses

Re: Update on the spinlock->pthread_mutex patch experimental: replace s_lock spinlock code with pthread_mutex on linux at 2012-07-01 21:28:30 from Nils Goroll

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jeff Janes	2012-07-01 21:03:34	Re: pgbench--new transaction type
Previous Message	Tom Lane	2012-07-01 20:18:48	Re: XX000: enum value 117721 not found in cache for enum enumcrash