[PATCH] CRC32C optimizations using SVE2 on ARM.

From: "Devanga(dot)Susmitha(at)fujitsu(dot)com" <Devanga(dot)Susmitha(at)fujitsu(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: "Ragesh(dot)Hajela(at)fujitsu(dot)com" <Ragesh(dot)Hajela(at)fujitsu(dot)com>, "Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com" <Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com>
Subject: [PATCH] CRC32C optimizations using SVE2 on ARM.
Date: 2025-12-18 16:18:18
Message-ID: OSZPR01MB8499594722A7FC326706099C8BABA@OSZPR01MB8499.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,
This email aims to discuss the contribution of an optimized CRC32C implementation for ARM (aarch64) machines. The CRC32C function is widely used throughout PostgreSQL for checksum workloads such as WAL generation and data integrity validation.
The current CRC32C implementation on ARM relies on scalar hardware instructions (__crc32cb/ch/cw/cd) which process 1, 2, 4, or 8 bytes per iteration. While correct and efficient for smaller inputs, this scalar design becomes a bottleneck for large buffer sizes, leading to noticeable performance degradation.
With the introduction of an SVE2Cbased implementation, we leverage wide vector intrinsics to process up to 128 bytes per iteration using 8 vectors in parallel. This design significantly accelerates CRC32C execution by reducing the total number of loop iterations, minimizing serial dependency chains, and improving compute and memory throughput.
We have implemented this feature to ensure correctness, compatibility, and safe integration. It includes compile-time and runtime checks to detect SVE2 support on both the compiler and underlying hardware. When SVE2 is unavailable, the function safely falls back to the existing scalar CRC32C path to ensure consistent results across systems.
For architecture-specific functions, we use pg_attribute_target("arch=armv9-a+sve2-aes") to ensure precise compilation control without modifying global CFLAGS, enabling a clean integration within PostgreSQL’s build system.
System Configuration
Machine: AWS EC2 c8g.4xlarge (16 cores, 30 GB RAM)
OS: Ubuntu 22.04.5 LTS
GCC: 13.1.0
Benchmark and Results
Setup:
We used the CRC32C microbenchmark SQL function published on the PostgreSQL mailing list [0] to evaluate the performance of the SVE2 implementation against the existing scalar ARM version across multiple buffer sizes.
Query:
time SELECT drive_crc32c(1000000, bytes);
The experiment was executed for input sizes ranging from 8 bytes up to 32 MB.
Results:
Significant performance gains are observed starting from 128 bytes. For larger buffer sizes (≥ 1 KB), the SVE2 implementation achieves approximately 2C3 times speed-ups, with peak improvements observed for multi-megabyte inputs due to parallel folding and polynomial carry-less multiplication using 8 SVE2 vectors.
These improvements make CRC32C computation substantially faster for real PostgreSQL workloads involving large data blocks or WAL buffers.
We would like to contribute this work so that it becomes available to the PostgreSQL community. As part of the process, we are following the guidelines provided in Submitting a Patch - PostgreSQL wiki<https://wiki.postgresql.org/wiki/Submitting_a_Patch>. Please find the attachment for the patches and performance results.
Please let us know if you have any queries or suggestions.
Thanks & Regards,
Susmitha Devanga.

[0]
postgresql.org/message-id/attachment/169378/v10-0001-Add-a-Postgres-SQL-function-for-crc32c-benchmark.patch<https://www.postgresql.org/message-id/attachment/169378/v10-0001-Add-a-Postgres-SQL-function-for-crc32c-benchmark.patch>

Attachment Content-Type Size
v1-0001-crc32-sve2.patch application/octet-stream 1.1 MB
crc32c_arm.png image/png 91.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2025-12-18 16:35:55 Re: index prefetching
Previous Message Anthonin Bonnefoy 2025-12-18 16:12:24 Re: Fix possible 'unexpected data beyond EOF' on replica restart