Re: Update on the spinlock->pthread_mutex patch experimental: replace s_lock spinlock code with pthread_mutex on linux

From: Nils Goroll <slink(at)schokola(dot)de>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Update on the spinlock->pthread_mutex patch experimental: replace s_lock spinlock code with pthread_mutex on linux
Date: 2012-07-01 21:28:30
Message-ID: 4FF0C0FE.3010608@schokola.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Jeff,

>>> It looks like the hacked code is slower than the original. That
>>> doesn't seem so good to me. Am I misreading this?
>>
>> No, you are right - in a way. This is not about maximizing tps, this is about
>> maximizing efficiency under load situations
>
> But why wouldn't this maximized efficiency present itself as increased TPS?

Because the latency of lock aquision influences TPS, but this is only marginally
related to the cost in terms of cpu cyclues to aquire the locks.

See my posting as of Sun, 01 Jul 2012 21:02:05 +0200 for an overview of my
understanding.

>>> Also, 20 transactions per connection is not enough of a run to make
>>> any evaluation on.
>>
>> As you can see I've repeated the tests 10 times. I've tested slight variations
>> as mentioned above, so I was looking for quick results with acceptable variation.
>
> Testing it 10 times doesn't necessarily improve things.

My intention was to average over the imperfections of rusage accounting because
I was maily interested in lowering rusage, not maximizing tps.

Yes, in order to get reliable results, I'd have to run longer tests, but
interestingly the results from my quick tests already approximated those from
the huge tests Robert has run with respect to the differences between unpatched
and patched.

> You should use at least -T30, rather than -t20.

Thanks for the advice - it is really appreciated and I will take it when I run
more test tests.

But I don't understand yet how to best provoke high spinlock concurrency with
pgbench. Or are there are any other test tools out there for this case?

> Anyway, your current benchmark speed of around 600 TPS over such a
> short time periods suggests you are limited by fsyncs.

Definitely. I described the setup in my initial posting ("why roll-your-own
s_lock? / improving scalability" - Tue, 26 Jun 2012 19:02:31 +0200)

> pgbench does as long as that is the case. You could turn --fsync=off,
> or just change your benchmark to a read-only one like -S, or better
> the -P option I've been trying get into pgbench.

I don't like to make assumptions which I haven't validated. The system showing
the behavior is designed to write to persistent SSD storage in order to reduce
the risk of data loss by a (BBU) cache failure. Running a test with fsync=off
would divert even further from reality.

> Does your production server have fast fsyncs (BBU) while your test
> server does not?

No, we're writing directly to SSDs (ref: initial posting).

> The users probably don't care about the load average. Presumably they
> are unhappy because of lowered throughput (TPS) or higher peak latency
> (-l switch in pgbench). So I think the only use of load average is to
> verify that your benchmark is nothing like your production workload.
> (But it doesn't go the other way around, just because the load
> averages are similar doesn't mean the actual workloads are.)

Fully agree.

>> Rank Total duration Times executed Av. duration s Query
>> 1 3m39s 83,667 0.00 COMMIT;
>
> So fsync's probably are not totally free on production, but I still
> think they must be much cheaper than on your test box.

Oh, the two are the same. I ran the tests on the prod machine during quiet periods.

>> 2 54.4s 2 27.18 SELECT ...
>
> That is interesting. Maybe those two queries are hammering everything
> else to death.

With 64 cores?

I should have mentioned that these were simply the result of a missing index
when the data was collected.

> But how does the 9th rank through the final rank, cumulatively, stack up?
>
> In other words, how many query-seconds worth of time transpired during
> the 137 wall seconds? That would give an estimate of how many
> simultaneously active connections the production server has.

Sorry, I should have given you the stats from pgFouine:

Number of unique normalized queries: 507
Number of queries: 295,949
Total query duration: 8m38s
First query: 2012-06-23 14:51:01
Last query: 2012-06-23 14:53:17
Query peak: 6,532 queries/s at 2012-06-23 14:51:33

>> Sorry for having omitted that detail. I had initialized pgbench with -i -s 100
>
> Are you sure? In an earlier email you reported the entire output of
> pgbench, and is said it was using 10. Maybe you've changed it since
> then...

good catch, I was wrong in the email you quoted. Sorry.

-bash-4.1$ rsync -av --delete /tmp/test_template_data/ /tmp/data/
...
-bash-4.1$ ./postgres -D /tmp/data -p 55502 &
[1] 38303
-bash-4.1$ LOG: database system was shut down at 2012-06-26 23:18:42 CEST
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
-bash-4.1$ ./psql -p 55502
psql (9.1.3)
Type "help" for help.
postgres=# select count(*) from pgbench_branches;
count
-------
10
(1 row)

Thank you very much, Jeff! The one question remains: Do we really have all we
need to provoke very high lock contention?

Nils

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2012-07-01 22:06:25 Re: We probably need autovacuum_max_wraparound_workers
Previous Message Jeff Janes 2012-07-01 21:03:34 Re: pgbench--new transaction type