Re: Improving spin-lock implementation on ARM.

From: Krunal Bauskar <krunalbauskar(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improving spin-lock implementation on ARM.
Date: 2020-11-30 06:19:25
Message-ID: CAB10pyZcDqfr7L_T27qcrVAC4PPipS62J0oQepvtrE=uQaO7Ag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 30 Nov 2020 at 11:38, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Krunal Bauskar <krunalbauskar(at)gmail(dot)com> writes:
> > On Mon, 30 Nov 2020 at 10:14, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> The results I posted at [1] seem to contradict this for Apple's new
> >> machines.
>
> > For the results you saw on Mac-Mini was LSE enabled by default.
>
> Hmm, I don't know how to get Apple's clang to admit what its default
> settings are ... anybody?
>
> However, it does accept "-march=armv8-a+lse", and that seems to
> not be the default, because I get different results from my spinlock-
> pounding test than I did yesterday. Abbreviating into a table:
>
> --- CFLAGS=-O2 --- --- CFLAGS="-O2
> -march=armv8-a+lse" ---
>
> TPS HEAD CAS patch HEAD CAS patch
>
> clients=1 2127 2174 2612 2722
> clients=2 1816 859 892 950
> clients=4 714 519 610 468
> clients=8 - - 108 185
>

Thanks for trying this Tom.

---------

Some of us may be surprised by the fact that enabling lse is causing
regression (1816 -> 892 or 714 -> 610) with HEAD itself.
While lse is meant to improve the performance. This, unfortunately, is not
always the case at-least based on my previous experience with LSE.too.

I am still wondering why CAS is slower than TAS on M1. What is special on
M1 that other ARM archs has not picked up.

Tom, Sorry to bother you again but this is arising a lot of curiosity about
M1.
Whenever you get time can do some micro-benchmarking on M1 (to understand
TAS vs CAS).
Also, if you can share assembly code is emitted for the TAS vs CAS.

>
> Unfortunately, that still doesn't lead me to think that either LSE
> or CAS are net wins on this hardware. It's quite clear that LSE
> makes the uncontended case a good bit faster, but the contended case
> is a lot worse, so is that really a tradeoff we want?
>
> > * I would also suggest if possible try with higher scalability (more
> than 4
> > to check if with increase scalability CAS out-perform).
>
> As I said yesterday, running more than 4 processes is just going
> to bring the low-performance cores into the equation, which is likely
> to swamp any interesting comparison. I did run the test with "-c 8"
> today, as shown in the right-hand columns, and the results seem
> to bear that out.
>
> regards, tom lane
>

--
Regards,
Krunal Bauskar

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuro Yamada 2020-11-30 06:24:23 Re: Is it useful to record whether plans are generic or custom?
Previous Message Tom Lane 2020-11-30 06:08:27 Re: Improving spin-lock implementation on ARM.