From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Krunal Bauskar <krunalbauskar(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Improving spin-lock implementation on ARM. |
Date: | 2020-11-28 10:31:28 |
Message-ID: | CAPpHfdt5b=5NdWT=gTeYVaWrzTUGSgtwVTPJJanhG8EzHeE6ew@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Nov 28, 2020 at 5:36 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> So at least on Apple's hardware, it seems like the CAS
> implementation might be a shade faster when uncontended,
> but it's very clearly worse when there is contention for
> the spinlock. That's interesting, because the argument
> that CAS should involve strictly less work seems valid ...
> but that's what I'm getting.
>
> It might be useful to try this on other ARM platforms,
> but I lack the energy right now (plus the only other
> thing I've got is a Raspberry Pi, which might not be
> something we particularly care about performance-wise).
I guess that might depend on the implementation of CAS and TAS. I bet
usage of CAS in spinlock gives advantage when ldxr/stxr are used, but
not when swpal/casa are used. I found out that I can force clang to
use swpal/casa by setting "-march=armv8-a+lse". I'm going to make
some experiments on a multicore AWS graviton2 instance with different
atomic implementation.
------
Regards,
Alexander Korotkov
From | Date | Subject | |
---|---|---|---|
Next Message | Drouvot, Bertrand | 2020-11-28 11:08:24 | Re: Add Information during standby recovery conflicts |
Previous Message | Craig Ringer | 2020-11-28 07:00:31 | Re: POC: postgres_fdw insert batching |