Re: Proposal for enabling auto-vectorization for checksum calculations

From: Andrew Kim <tenistarkim(at)gmail(dot)com>
To: John Naylor <johncnaylorls(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Proposal for enabling auto-vectorization for checksum calculations
Date: 2025-10-17 07:15:40
Message-ID: CAK64mneN20+sW5WhV+r7hMVo4Rd0z11B6=3L039rWMt1wK3nPg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi John,

Thank you for your detailed and constructive feedback on the checksum
AVX2 optimization patch.
I've carefully addressed all of your concerns and am pleased to share
the updated V6 implementation.

V6 Implementation adds SIMD-optimized checksum calculation using AVX2
instructions with automatic fallback to portable implementation,
incorporating all of your recommended improvements:

1. Code Organization
Consolidated architecture: Moved all checksum logic into a single
checksum.c file, eliminating the complexity of separate dispatch files
Simplified build integration: Streamlined both autoconf and meson
build configurations
2. Safety & Robustness
Eliminated dangerous runtime patching: Replaced direct function
pointer manipulation with safe dispatch through static function
pointers
Thread-safe design: All operations are now inherently thread-safe
without requiring locks or synchronization
3. Code Readability
Removed macro complexity: Replaced PG_DECLARE_CHECKSUM_ISA macros with
explicit, clear function declarations
PostgreSQL coding compliance: Follows established PostgreSQL
conventions throughout
Simplified conditional compilation: Removed redundant __x86_64__
guards, relying on configure script's platform detection
4. Compiler Detection & Compatibility
Preserved robust testing: Maintained the comprehensive avx2_test
function that validates both __attribute__((target("avx2"))) support
and AVX2 intrinsics functionality
Runtime feature detection: Uses __builtin_cpu_supports("avx2") for
reliable CPU capability detection

Build cleanly across all library variants (static, shared, server)
Compile without warnings under strict compiler flags
I believe this V6 implementation fully addresses your concerns while
delivering the performance benefits of AVX2 optimization.

Please find the V6 patch attached. I welcome any additional feedback
you may have.

Best regards,
Andrew Kim

On Wed, Oct 1, 2025 at 10:26 PM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
>
> On Thu, Sep 25, 2025 at 4:50 AM Andrew Kim <tenistarkim(at)gmail(dot)com> wrote:
> >
> > Thanks, John. I see the issue now — I’ll attach the entire patch
> > series in a single email so it shows up properly in the commitfest and
> > gets CI coverage.
>
> It's still picking up v4, and the archive link doesn't show any
> further replies. Something must have happened with the email
> threading, since you weren't on the thread at first. Please create an
> account and edit the entry to point to a more recent message ID:
>
> https://commitfest.postgresql.org/patch/5726/
>
> > Please find attached v6 of the patchset, updated per your feedback.
>
> Thanks. (BTW, we discourage top-posting and prefer to cut to size and
> use inline responses)
>
> This is not a complete review, but some architectural thoughts and
> some things I've noticed.
>
> The top of the checksum_impl.h has this:
>
> * This file exists for the benefit of external programs that may wish to
> * check Postgres page checksums. They can #include this to get the code
> * referenced by storage/checksum.h. (Note: you may need to redefine
> * Assert() as empty to compile this successfully externally.)
>
> It's going to be a bit tricky to preserve this ability while allowing
> the core server and client programs to dispatch to a specialized
> implementation, but we should at least try. That means keeping
> pg_checksum_block() and pg_checksum_page() where they live now.
>
> I think a good first refactoring patch would be to move
> src/backend/storage/checksum.c (which your patch doesn't even touch)
> to src/port (and src/include/storage/checksum.h to src/include/port)
> and have all callers use that. With that, I imagine only that
> checksum.c file would include checksum_impl.h.
>
> If that poses a problem, let us know -- we may have to further juggle
> things. If that works without issue, we can proceed with the
> specialization. On that, just a few things to note here, although the
> next patch doesn't need to worry about any of this yet:
>
> + #if defined(__has_attribute) && __has_attribute (target)
> + __attribute__((target("avx2")))
> + #endif
> + static int avx2_test(void)
> + {
> + const char buf@<:@sizeof(__m256i)@:>@;
> + __m256i accum = _mm256_loadu_si256((const __m256i *) buf);
> + accum = _mm256_add_epi32(accum, accum);
> + int result = _mm256_extract_epi32(accum, 0);
> + return (int) result;
> + }],
>
> If we're just testing if the target works, we can just use an empty
> function, right?
>
> +#define PG_DECLARE_CHECKSUM_ISA(ISANAME) \
> +uint32 \
> +pg_checksum_block_##ISANAME(const PGChecksummablePage *page);
> +
> +#define PG_DEFINE_CHECKSUM_ISA(ISANAME) \
> +pg_attribute_target(#ISANAME) \
> +uint32 pg_checksum_block_##ISANAME(const PGChecksummablePage *page) \
>
> I find this hard to read compared to just using the actual name.
>
> +avx2_available(void)
> +{
> +#if defined (USE_AVX2_WITH_RUNTIME_CHECK) && defined(__x86_64__)
>
> Why guard on __x86_64__?
>
> +PG_DEFINE_CHECKSUM_ISA(default)
> +{
> + uint32 sums[N_SUMS], result = 0;
> + uint32 i, j;
> [...]
>
> +#ifdef USE_AVX2_WITH_RUNTIME_CHECK
> +PG_DEFINE_CHECKSUM_ISA(avx2)
> +{
> + uint32 sums[N_SUMS], result = 0;
> + uint32 i, j;
> [...]
>
> With the single src/port file idea above, these would just do "return
> pg_checksum_block()" (or pg_checksum_page, whichever makes more
> sense).
>
> + if (avx2_available())
> + {
> + /* optional: patch pointer so next call goes directly */
> + pg_checksum_block = pg_checksum_block_avx2;
> + return pg_checksum_block_avx2(page);
> + }
>
> Not sure what your referring to here by "patching" the pointer, but it
> sounds dangerous. Besides, the cost of indirection is basically zero
> for multi-kilobyte inputs, so there is not even any motivation to
> consider doing differently.
>
> --
> John Naylor
> Amazon Web Services

Attachment Content-Type Size
v6-0001-Enable-autovectorizing-pg_checksum_block-with-AVX2-runtime-detection.patch application/octet-stream 17.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2025-10-17 07:17:19 Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Previous Message Chao Li 2025-10-17 06:20:07 Re: Fix an unnecessary cast calling elog in ExecHashJoinImpl