Re: vectorized CRC on ARM64

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: vectorized CRC on ARM64
Date: 2026-04-03 08:22:59
Message-ID: CANWCAZa-4WN8_bPV=GKZii2b5kdwffdWreuykP6ogT_dRYhPdQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 2, 2026 at 11:17 PM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> On Thu, Apr 02, 2026 at 10:53:24AM -0500, Nathan Bossart wrote:
> > I think the new pg_comp_crc32_choose() is infinitely recursing on macOS
> > because USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK is not defined but
> > pg_crc32c_armv8_available() returns false. If I trace through that
> > function, I see that it's going straight to the
> >
> > #else
> > return false;
> > #endif
> >
> > at the end. And sure enough, both HAVE_ELF_AUX_INFO and HAVE_GETAUXVAL

Ah of course.

> > aren't defined in pg_config.h. I think we might need to use sysctlbyname()
> > to determine PMULL support on macOS, but at this stage of the development
> > cycle, I would probably lean towards just compiling in the sb8
> > implementation.
>
> Hm. On second thought, that probably regresses macOS builds because it was
> presumably using the armv8 path without runtime checks before...

I went with the following for v5, and it passes MacOS on my Github CI:

+ /* set fallbacks */
+#ifdef USE_ARMV8_CRC32C
+ /* On e.g. MacOS, our runtime feature detection doesn't work */
+ pg_comp_crc32c = pg_comp_crc32c_armv8;
+#else
+ pg_comp_crc32c = pg_comp_crc32c_sb8;
+#endif
+ [...crc and pmull checks]

That should keep scalar hardware support working, but now it'll only
use direct calls for constant inputs.

I also did some benchmarking on an ARM Neoverse N1 / gcc 8.3
(attached). There the vector loop still works well all the way down to
the minimum input size of 64 bytes, and on long inputs it's almost
twice as fast as scalar. For reproduceability, I slightly modified the
benchmark we used last year, to make sure the input is aligned
(attached but not for CI). In the end, I want to add a length check so
that inputs smaller than 80 bytes go straight to the scalar path.
Above 80, after alignment adjustments in the preamble, that still
guarantees at least one loop iteration in the vector path.

--
John Naylor
Amazon Web Services

Attachment Content-Type Size
v5-crc-n1-bench.txt text/plain 1.6 KB
test-crc.sh application/x-shellscript 276 bytes
v5-0001-Compute-CRC32C-on-ARM-using-the-Crypto-Extension-.patch text/x-patch 16.0 KB
v503-0002-Add-a-Postgres-SQL-function-for-crc32c-benchmar.patch.nocfbot application/octet-stream 6.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2026-04-03 08:51:00 Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication
Previous Message Jelte Fennema-Nio 2026-04-03 07:52:24 Re: Add "format" target to make and ninja to run pgindent and pgperltidy