Re: spin_delay() for ARM

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: spin_delay() for ARM
Date: 2020-04-16 07:18:18
Message-ID: CAJ3gD9e86GY=QfyfZQkb11Z+CVWowDiGgGThzKKwHDGU9uA2yA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 13 Apr 2020 at 20:16, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> On Sat, 11 Apr 2020 at 04:18, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > I wrote:
> > > A more useful test would be to directly experiment with contended
> > > spinlocks. As I recall, we had some test cases laying about when
> > > we were fooling with the spin delay stuff on Intel --- maybe
> > > resurrecting one of those would be useful?
> >
> > The last really significant performance testing we did in this area
> > seems to have been in this thread:
> >
> > https://www.postgresql.org/message-id/flat/CA%2BTgmoZvATZV%2BeLh3U35jaNnwwzLL5ewUU_-t0X%3DT0Qwas%2BZdA%40mail.gmail.com
> >
> > A relevant point from that is Haas' comment
> >
> > I think optimizing spinlocks for machines with only a few CPUs is
> > probably pointless. Based on what I've seen so far, spinlock
> > contention even at 16 CPUs is negligible pretty much no matter what
> > you do. Whether your implementation is fast or slow isn't going to
> > matter, because even an inefficient implementation will account for
> > only a negligible percentage of the total CPU time - much less than 1%
> > - as opposed to a 64-core machine, where it's not that hard to find
> > cases where spin-waits consume the *majority* of available CPU time
> > (recall previous discussion of lseek).
>
> Yeah, will check if I find some machines with large cores.

I got hold of a 32 CPUs VM (actually it was a 16-core, but being
hyperthreaded, CPUs were 32).
It was an Intel Xeon , 3Gz CPU. 15G available memory. Hypervisor :
KVM. Single NUMA node.
PG parameters changed : shared_buffer: 8G ; max_connections : 1000

I compared pgbench results with HEAD versus PAUSE removed like this :
perform_spin_delay(SpinDelayStatus *status)
{
- /* CPU-specific delay each time through the loop */
- SPIN_DELAY();

Ran with increasing number of parallel clients :
pgbench -S -c $num -j $num -T 60 -M prepared
But couldn't find any significant change in the TPS numbers with or
without PAUSE:

Clients HEAD Without_PAUSE
8 244446 247264
16 399939 399549
24 454189 453244
32 1097592 1098844
40 1090424 1087984
48 1068645 1075173
64 1035035 1039973
96 976578 970699

May be it will indeed show some difference only with around 64 cores,
or perhaps a bare metal machine will help; but as of now I didn't get
such a machine. Anyways, I thought why not archive the results with
whatever I have.

Not relevant to the PAUSE stuff .... Note that when the parallel
clients reach from 24 to 32 (which equals the machine CPUs), the TPS
shoots from 454189 to 1097592 which is more than double speed gain
with just a 30% increase in parallel sessions. I was not expecting
this much speed gain, because, with contended scenario already pgbench
processes are already taking around 20% of the total CPU time of
pgbench run. May be later on, I will get a chance to run with some
customized pgbench script that runs a server function which keeps on
running an index scan on pgbench_accounts, so as to make pgbench
clients almost idle.

Thanks
-Amit Khandekar

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-04-16 07:22:41 Re: Race condition in SyncRepGetSyncStandbysPriority
Previous Message Michael Paquier 2020-04-16 06:30:36 Re: [PATHC] Fix minor memory leak in pg_basebackup