ARM/ARM64 SpinLockAcquire and SpinLockRelease are not full barriers

From: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
To: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject: ARM/ARM64 SpinLockAcquire and SpinLockRelease are not full barriers
Date: 2025-05-13 16:22:29
Message-ID: 24e4cce0-e735-4de1-ab1d-862d55581abe@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Good day.

During conversation about other patch, Andres pointed out SpinLockAcquire
have semantic of full memory barrier [1] .

spin.h also states:

> Load and stores operation in calling code are guaranteed not to be
reordered with respect to these operations, because they include a compiler
barrier.

But on ARM/ARM64 platform tas in s_lock.h is implemented as simple call to
__sync_lock_test_and_set, and . And GCC's documentation states it has only
Acquire semantic, and __sync_lock_release (used as implementation of
SpinLockRelease) has only Release semantic [2].

Compiling and disassembling postgresql with -O2 optimization level on arm64
we can see that SpinLockAquire is compiled down to call to
__aarch64_swp4_sync (in optimistic case) which has disassemble:

0000000000662cc0 <__aarch64_swp4_sync>:
662cc0: d503245f bti c
662cc4: 90001bf0 adrp x16, 9de000 <hist_start+0x3d18>
662cc8: 39582210 ldrb w16, [x16, #1544]
662ccc: 34000070 cbz w16, 662cd8
662cd0: b8a08020 swpa w0, w0, [x1]
662cd4: d65f03c0 ret
662cd8: 2a0003f0 mov w16, w0
662cdc: 885f7c20 ldxr w0, [x1]
662ce0: 88117c30 stxr w17, w16, [x1]
662ce4: 35ffffd1 cbnz w17, 662cdc
662ce8: d5033bbf dmb ish
662cec: d65f03c0 ret

Here we see 'swpa' instruction which has only acquire semantic [3].

And SpinLockRelease generates 'stlr' instruction, which has release
semantic, iirc.

Godbolt shows noticable difference between __sync_lock_test_and_set vs
__atomic_exchange_n(ATOMIC_SEQ_CST) (and __sync_lock_release vs
__atomic_store_n(ATOMIC_SEQ_CST)) using GCC even with march=armv7 [4]:
- __atomic_exchange_n and __atomic_store_n has 'dmb ish' instruction both
before and after load/store operations,
- but __sync_lock_test_and_set has 'dmb ish' only after load/store
operations and __sync_lock_release only before store operation. (You can
see same pattern in __aarch64_swp4_sync above in case 'swpa' instruction is
not present).

Therefore I think neither SpinLockAcquire nor SpinLockRelease have no full
memory barrier semantic on ARM/ARM64 at the moment when compiled with GCC.

Surprisingly, Clang puts 'dmb ish' both before and after load/store at
__sync_lock_test_and_set, but does not the same for __sync_lock_release.

As a simple fix I suggest to put __sync_synchronize before
__sync_lock_test_and_set and after __sync_lock_release.

Initially I wanted to use __atomic_exchange_n and __atomic_store_n. But in
absence of support for atomic instructions, their inner loop doesn't look
to be safe for reordering since it uses ldar+stlr loop:
- looks like previous operations could be reordered after 'ldar'.
- similarly following operations could be reordered before 'stlr'.
There's very relevant discussion at GCC's bugzilla [5].
Probably this hypothesis is not valid. I believe we have to ask someone
closely familiar with ARM internals to be sure.

# Alternative.

As an alternative, we may fix comments about
SpinLockAcquire/SpinLockRelease to state they provide acquire and release
barrier semantics only. And then check all their uses and set appropriate
memory barriers where their usage as full memory barriers is detected (as
in [1]).

[1]
https://postgr.es/m/5ccgykypol3azijw2chqnpg3rhuwjtwsmbfs3pgcqm7fu6laus%40wppo6zcfszay
[2]
https://gcc.gnu.org/onlinedocs/gcc-15.1.0/gcc/_005f_005fsync-Builtins.html#index-_005f_005fsync_005flock_005ftest_005fand_005fset
[3]
https://developer.arm.com/documentation/100069/0606/Data-Transfer-Instructions/SWPA--SWPAL--SWP--SWPL--SWPAL--SWP--SWPL
[4] https://godbolt.org/z/h5P9fjzWd
[5] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--
regards
Yura Sokolov aka funny-falcon

Attachment Content-Type Size
v1-0001-Make-SpinLockAcquire-and-SpinLockRelease-full-mem.patch text/x-patch 2.1 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Dolgov 2025-05-13 16:57:35 Re: queryId constant squashing does not support prepared statements
Previous Message Andres Freund 2025-05-13 15:54:42 Re: fix: propagate M4 env variable to flex subprocess