| From: | John Naylor <johncnaylorls(at)gmail(dot)com> |
|---|---|
| To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: vectorized CRC on ARM64 |
| Date: | 2026-01-12 09:27:00 |
| Message-ID: | CANWCAZYAKWTFW_1byMq44t57urk6YKoDh1z1Zp6fk5-4dOPy-w@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, May 14, 2025 I wrote:
>
> We did something similar for x86 for v18, and here is some progress
> towards Arm support.
Coming back to this, since there's been recent interest in Arm support.
v2 is a rebase, with a few changes.
- I simplified it by leaving out the inlining for "assume CRC" builds,
since I wanted to avoid alignment considerations if I can. I think
always indirecting through a pointer will have less risk of
regressions in a realistic setting than for x86 since Arm chips
typically have low latency for carryless multiplication instructions.
With just a bit of code we can still use the direct call for small
constant inputs, so I did that to avoid regressions under WAL insert
lock.
- One coding idiom for a vector literal in the generated code was
giving pgindent indigestion, I so rewrote it using Neon intrinsics and
verified it in Godbolt.
> 0002: Like 3c6e8c12389 and in fact uses the same program to generate
> the code, by specifying Neon instructions with the Arm "crypto"
> extension instead. There are some interesting differences from x86
> here as well:
> - The upstream implementation chose to use inline assembly instead of
> intrinsics for some reason. I initially thought that was a way to get
> broader compiler support, but it turns out you still need to pass the
> relevant flags to get the assembly to link.
To follow-up for curiosity's sake, [1] says that Apple chips can issue
PMULL + EOR as a single uop if they are next to each other in the
instruction stream.
> - I only have Meson support for now, since I used MacOS on CI to test.
> That OS and compiler combination apparently targets the CRC extension,
> but the PMULL instruction runtime check uses Linux-only headers, I
> believe, so previously I hacked the choose function to return true for
> testing. The choose function in 0002 is untested in this form.
This is still true, but now the CI hack lives in a separate
not-for-commit patch for clarity.
autoconf support is a WIP, and I will share that after I do some
testing on an Arm Linux instance.
[1] https://dougallj.github.io/applecpu/firestorm.html
--
John Naylor
Amazon Web Services
| Attachment | Content-Type | Size |
|---|---|---|
| v2-0001-Compute-CRC32C-on-ARM-using-the-Crypto-Extension-.patch | text/x-patch | 9.5 KB |
| v2-0002-Force-testing-on-MacOS-CI-XXX-not-for-commit.patch | text/x-patch | 790 bytes |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Banck | 2026-01-12 09:31:13 | Re: Maybe BF "timedout" failures are the client script's fault? |
| Previous Message | Chao Li | 2026-01-12 09:21:49 | Re: [PATCH] Add sampling statistics to autoanalyze log output |