Re: Improving spin-lock implementation on ARM.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: Krunal Bauskar <krunalbauskar(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving spin-lock implementation on ARM.
Date: 2020-11-28 02:35:57
Message-ID: 741389.1606530957@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> It might be that this hardware is capable of showing a difference with a
> better-tuned pgbench test, but with an untuned pgbench run, we just aren't
> sufficiently sensitive to the spinlock properties. (Which I guess is good
> news, really.)

It occurred to me that if we don't insist on a semi-realistic test case,
it's not that hard to just pound on a spinlock and see what happens.
I made up a simple C function (attached) to repeatedly call
XLogGetLastRemovedSegno, which is basically just a spinlock
acquire/release. Using this as a "transaction":

$ cat bench.sql
select drive_spinlocks(50000);

I get this with HEAD:

$ pgbench -f bench.sql -n -T 60 -c 1 bench
transaction type: bench.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 127597
latency average = 0.470 ms
tps = 2126.479699 (including connections establishing)
tps = 2126.595015 (excluding connections establishing)

$ pgbench -f bench.sql -n -T 60 -c 2 bench
transaction type: bench.sql
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 1
duration: 60 s
number of transactions actually processed: 108979
latency average = 1.101 ms
tps = 1816.051930 (including connections establishing)
tps = 1816.150556 (excluding connections establishing)

$ pgbench -f bench.sql -n -T 60 -c 4 bench
transaction type: bench.sql
scaling factor: 1
query mode: simple
number of clients: 4
number of threads: 1
duration: 60 s
number of transactions actually processed: 42862
latency average = 5.601 ms
tps = 714.202152 (including connections establishing)
tps = 714.237301 (excluding connections establishing)

(With only 4 high-performance cores, it's probably not
interesting to go further; involving the slower cores
will just confuse matters.) And this with the patch:

$ pgbench -f bench.sql -n -T 60 -c 1 bench
transaction type: bench.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 130455
latency average = 0.460 ms
tps = 2174.098284 (including connections establishing)
tps = 2174.217097 (excluding connections establishing)

$ pgbench -f bench.sql -n -T 60 -c 2 bench
transaction type: bench.sql
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 1
duration: 60 s
number of transactions actually processed: 51533
latency average = 2.329 ms
tps = 858.765176 (including connections establishing)
tps = 858.811132 (excluding connections establishing)

$ pgbench -f bench.sql -n -T 60 -c 4 bench
transaction type: bench.sql
scaling factor: 1
query mode: simple
number of clients: 4
number of threads: 1
duration: 60 s
number of transactions actually processed: 31154
latency average = 7.705 ms
tps = 519.116788 (including connections establishing)
tps = 519.144375 (excluding connections establishing)

So at least on Apple's hardware, it seems like the CAS
implementation might be a shade faster when uncontended,
but it's very clearly worse when there is contention for
the spinlock. That's interesting, because the argument
that CAS should involve strictly less work seems valid ...
but that's what I'm getting.

It might be useful to try this on other ARM platforms,
but I lack the energy right now (plus the only other
thing I've got is a Raspberry Pi, which might not be
something we particularly care about performance-wise).

regards, tom lane

Attachment Content-Type Size
spinlocktest.c text/x-c 499 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2020-11-28 07:00:31 Re: POC: postgres_fdw insert batching
Previous Message Tomas Vondra 2020-11-28 02:10:40 Re: POC: postgres_fdw insert batching