| From: | Greg Burd <greg(at)burd(dot)me> |
|---|---|
| To: | Andres Freund <andres(at)anarazel(dot)de> |
| Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dave Cramer <davecramer(at)gmail(dot)com> |
| Subject: | Re: [PATCH] Fix ARM64/MSVC atomic memory ordering issues on Win11 by adding explicit DMB barriers |
| Date: | 2025-11-23 20:41:31 |
| Message-ID: | 8EAE8CAE-FAAC-474A-A611-FEE2809EBE6D@greg.burd.me |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Nov 23 2025, at 10:55 am, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Hi,
>
> On 2025-11-22 16:43:30 -0500, Greg Burd wrote:
>> With the new MSVC compiler flag Andres mentioned (/arch:armv9.4) I only
>> had to update the S_UNLOCK() macro, the compiler did the rest correctly
>> AFAICT.
>
> Just to be clear - the flag shouldn't be necessary for things to work
> correctly. I was only using it to have godbolt inline the intrinsics, rather
> than have them generate an out-of-line function call that I couldn't easily
> inspect. I'm fairly sure that the out-of-line functions also have the relevant
> barriers.
Well, I learned something new about MSVC (again) today. Thanks Andres
for pointing that out. I fiddled with godbolt a bit and indeed on ARM64
systems the only thing that flag (/arch:armv9.4) does is to replace the
out-of-line function call for the intrinsic with the inlined version of it.
Without:
bl _InterlockedCompareExchange
With:
mov w10,#2
str w8,[sp]
mov x9,sp
casal w8,w10,[x9]
Good to know, and yes it does seem that the ASM includes the correct
instruction sequence.
I think, except for the build issues, the one thing I can put my finger
on as a "fix" is the change from _ReadWriteBarrier() to _dmb(_ARM64_BARRIER_SY).
The docs say:
The _ReadWriteBarrier intrinsic limits the compiler optimizations that
can remove or reorder memory accesses across the point of the call.
Note that it says "compiler optimization" not "system memory barrier".
So, this macro change:
#define S_UNLOCK(lock) \
- do { _ReadWriteBarrier(); (*(lock)) = 0; } while (0)
+ do { __dmb(_ARM64_BARRIER_SY); (*(lock)) = 0; } while (0)
Makes more sense. This change replaces a compiler-only barrier with a
full system memory barrier, fundamentally strengthening memory ordering
guarantees for spinlock release on ARM64.
When I remove the compiler flag from the patch and keep the S_UNLOCK()
change the tests pass. I think we've found the critical fix. I'll
update the patch set.
>> @@ -2509,7 +2513,10 @@ int main(void)
>> }
>> '''
>>
>> - if cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and
>> __crc32cd without -march=armv8-a+crc',
>> + if cc.get_id() == 'msvc'
>> + cdata.set('USE_ARMV8_CRC32C', 1)
>> + have_optimized_crc = true
>
> Should have a comment explaining why we can do this unconditionally...
Sure, the more I look at this the more I want to clean it up a bit more.
> Greetings,
>
> Andres Freund
Thanks again for the helpful nudge Andres, best.
-greg
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Greg Burd | 2025-11-23 20:41:52 | Re: [PATCH] Fix ARM64/MSVC atomic memory ordering issues on Win11 by adding explicit DMB barriers |
| Previous Message | Viktor Holmberg | 2025-11-23 20:34:11 | Re: ON CONFLICT DO SELECT (take 3) |