Re: Improving spin-lock implementation on ARM.

From: Krunal Bauskar <krunalbauskar(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving spin-lock implementation on ARM.
Date: 2020-11-26 12:40:31
Message-ID: CAB10pyZfSW2ayVcUpqF8jf5TTxtY60jfZvKKYzyChfN55o6hXw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 26 Nov 2020 at 16:02, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:

> On 26/11/2020 06:30, Krunal Bauskar wrote:
> > Improving spin-lock implementation on ARM.
> > ------------------------------------------------------------
> >
> > * Spin-Lock is known to have a significant effect on performance
> > with increasing scalability.
> >
> > * Existing Spin-Lock implementation for ARM is sub-optimal due to
> > use of TAS (test and swap)
> >
> > * TAS is implemented on ARM as load-store so even if the lock is not
> free,
> > store operation will execute to replace the same value.
> > This redundant operation (mainly store) is costly.
> >
> > * CAS is implemented on ARM as load-check-store-check that means if the
> > lock is not free, check operation, post-load will cause the loop to
> > return there-by saving on costlier store operation. [1]
>
> Can you add some code comments to explain that why CAS is cheaper than
> TAS on ARM?
>

1. As Alexey too pointed out in followup email
CAS = load value -> check if the value is expected -> if yes then replace
(store new value) -> else exit/break
TAS = load value -> store new value

This means each TAS would be converted to 2 operations that are LOAD and
STORE and of-course
STORE is costlier in terms of latency. CAS ensures optimization in this
regard with an early check.

Let's look at some micro-benchmarking. I implemented a simple spin-loop
using both approaches and
as expected with increase scalability, CAS continues to out-perform TAS by
a margin up to 50%.

---- TAS ----
Running 128 parallelism
Elapsed time: 1.34271 s
Running 256 parallelism
Elapsed time: 3.6487 s
Running 512 parallelism
Elapsed time: 11.3725 s
Running 1024 parallelism
Elapsed time: 43.5376 s
---- CAS ----
Running 128 parallelism
Elapsed time: 1.00131 s
Running 256 parallelism
Elapsed time: 2.53202 s
Running 512 parallelism
Elapsed time: 7.66829 s
Running 1024 parallelism
Elapsed time: 22.6294 s

This could be also observed from the perf profiling

TAS:
15.57 │44: ldxr w0, [x19]
83.93 │ stxr w1, w21, [x19]

CAS:
81.29 │58: ↓ b.ne cc
....
9.86 │cc: ldaxr w0, [x22]
8.84 │ cmp w0, #0x0
│ ↑ b.ne 58
│ stxr w1, w20, [x22]

*In TAS: STORE is pretty costly.*

2. I have added the needed comment in the patch. Updated patch attached.

----------------------
Thanks for taking look at this and surely let me know if any more info is
needed.

>
> Is there some official ARM documentation, like a programmer's reference
> manual or something like that, that would show a reference
> implementation of a spinlock on ARM? It would be good to refer to an
> authoritative source on this.
>
> - Heikki
>

--
Regards,
Krunal Bauskar

Attachment Content-Type Size
cas-vs-tas.cc text/x-c++src 2.0 KB
cas.patch text/x-patch 1.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2020-11-26 12:43:12 Re: Improper use about DatumGetInt32
Previous Message Fujii Masao 2020-11-26 12:22:11 Re: [PATCH] Add features to pg_stat_statements