Re: mosbench revisited

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: mosbench revisited
Date: 2011-08-04 01:16:14
Message-ID: CA+TgmoYeS+RgQvnQEYNpA7JCjjd_0SkjSwF29Lrsy+vkGxcvrQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 3, 2011 at 5:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> That still seems utterly astonishing to me.  We're touching each of
> those files once per query cycle; a cycle that contains two message
> sends, who knows how many internal spinlock/lwlock/heavyweightlock
> acquisitions inside Postgres (some of which *do* contend with each
> other), and a not insignificant amount of plain old computing.
> Meanwhile, this particular spinlock inside the kernel is protecting
> what, a single doubleword fetch?  How is that the bottleneck?

Spinlocks seem to have a very ugly "tipping point". When I tested
pgbench -S on a 64-core system with the lazy vxid patch applied and a
patch to use random_r() in lieu of random, the amount of system time
used per SELECT-only transaction at 48 clients was 3.59 times as much
as it was at 4 clients. And the amount used per transaction at 52
clients was 3.63 times the amount used per transaction at 48 clients.
And the amount used at 56 clients was 3.25 times the amount used at 52
clients. You can see the throughput graph starting to flatten out in
the 32-44 client range, but it's not particularly alarming. However,
once you pass that point things rapidly get totally out of control in
a real hurry. A few more clients and the machine is basically doing
nothing but spin.

> I am wondering whether kernel spinlocks are broken.

I don't think so. Stefan Kaltenbrunner had one profile where he
showed something like sixty or eighty percent of the usermode CPU time
in s_lock. I didn't have access to that particular hardware, but the
testing I've done strongly suggests that most of that was the
SInvalReadLock spinlock. And before I patched pgbench to avoid
calling random(), that was doing the same thing - literally flattening
a 64-core box fighting over a single futex that normally costs almost
nothing. (That one wasn't quite as bad because the futex actually
deschedules the waiters, but it was still bad.) I'm actually not
really sure why it shakes out this way (birthday paradox?) but having
seen the effect several times now, I'm disinclined to believe it's an
artifact.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-08-04 01:17:21 Re: Compressing the AFTER TRIGGER queue
Previous Message Peter Geoghegan 2011-08-03 22:58:51 Re: Further news on Clang - spurious warnings