Re: Proposal for enabling auto-vectorization for checksum calculations

From: Andrew Kim <tenistarkim(at)gmail(dot)com>
To: Oleg Tselebrovskiy <o(dot)tselebrovskiy(at)postgrespro(dot)ru>
Cc: John Naylor <johncnaylorls(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Proposal for enabling auto-vectorization for checksum calculations
Date: 2025-10-24 08:09:56
Message-ID: CAK64mneZowPiNO7e-=rgetxfG7kMGhrhxX=Zc-7ve7s2hyQ_OA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Oleg,

Thank you for the detailed review on v7 patch.

On Mon, Oct 20, 2025 at 8:05 AM Oleg Tselebrovskiy
<o(dot)tselebrovskiy(at)postgrespro(dot)ru> wrote:
>
> Thanks for the new patch version!
>
> Another round of review:
>
> 1) I think that changes to contrib/pageinspect/rawpage.c should be in
> the main patch, not in the benchmark patch. Also, without those chages
> the main patch can't compile using make world-bin
>
This is already correctly handled in v8. The
contrib/pageinspect/rawpage.c change is in the main patch (v8-0001),
not in the benchmark patch. The include statement was updated from
#include "storage/checksum.h" to #include "port/checksum.h" in the
refactoring patch, which is the correct placement.

> 2) I still don't get why you check for working intrinsics in configure,
> config/c-compiler.m4 and meson.build, if your patch later uses them.
> I've gotten correct assembly code with this avx2_test function:
> #include <stdint.h>
> #if defined(__has_attribute) && __has_attribute (target)
> __attribute__((target("avx2")))
> static int avx2_test(void)
> {
> return 0;
> }
> #endif
> Please, check if this works for you and consider using something similar
>

I agree. In v8, I've simplified the configure tests significantly. The
config/c-compiler.m4 now uses exactly the pattern you suggested:

> 3) __builtin_cpu_supports doesn't work on Windows at all. We still have
> to use approach with __get_cpuid
>

Completely fixed in v8. I've removed all usage of
__builtin_cpu_supports and implemented proper cross-platform CPU
detection using __get_cpuid (Linux/GCC) and __cpuid (Windows/MSVC)
with proper preprocessor guards

> 4) Looks like you can safely remove "port/checksum_impl.h" from
> src/bin/pg_checksums/pg_checksums.c. It probably links with libpgport
> and/or libpgcommon, so it gets pg_checksum_page from there. Same with
> src/bin/pg_upgrade/file.c. Maybe those includes are "for clarity" and
> you don't need to remove them, but pg_checksums and pg_upgrade seem to
> work without them
>

In v8-0001, both files now only include "port/checksum.h". The direct
inclusion of checksum_impl.h has been removed:

src/bin/pg_checksums/pg_checksums.c: Only includes "port/checksum.h"
src/bin/pg_upgrade/file.c: Only includes "port/checksum.h"

> 5) You don't need #include <string.h> /* for memcpy */ in
> checksum_impl.h. At the very least, memcpy was used before your patch
> without string.h
>

Confirmed there's no explicit #include <string.h> in the v8
checksum_impl.h. The memcpy usage relies on the standard PostgreSQL
includes.

> 6) Why did you remove Assert(sizeof(PGChecksummablePage) == BLCKSZ)? Is
> it always false?
>
I didn't remove it - it's still present in both implementations in
v8. In both pg_checksum_block_default and pg_checksum_block_avx2, you
can see:

/* ensure that the size is compatible with the algorithm */
Assert(sizeof(PGChecksummablePage) == BLCKSZ);

> 7) Is reformatted variable declaration in pg_checksum_block_default_impl
> really needed? Is there a good reason for it? Or is it auto-formatting
> programm output?
>

Sorry, that's my mistake, In v8, I've kept the variable declarations
consistent with PostgreSQL style without unnecessary reformatting. The
declarations in both functions follow the same pattern as the original
code.

> 8) Your patch removes one whitespace in this line - for (i = 0; i <
> (uint32)(BLCKSZ / (sizeof(uint32) * N_SUMS)); i++)
> If you wish to fix formatting like that - please, do it in a separate
> patch. If this was done automatically by some formatting tool - please,
> revert this change
>

In v8, I've preserved the original formatting. The for loop maintains
the original spacing: for (i = 0; i < (uint32) (BLCKSZ /
(sizeof(uint32) * N_SUMS)); i++) with proper space after (uint32).

> 9) Unneeded empty string
> #define FNV_PRIME 16777619
>
> +
> /* Use a union so that this code is valid under strict aliasing */
> typedef union
>

Fixed, removed unnecessary blank lines in v8.

> 10) You need one line with just /* at the beginning of the comment, look
> at other multiline comments in this file
> + /* For now, AVX2 implementation is identical to default
> + * The compiler will auto-vectorize this with proper flags
> + * Future versions could use explicit AVX2 intrinsics here
> */
>

Fixed, it's started style with the opening /* on its own line:
/*
* AVX2-optimized block checksum algorithm.
* Same algorithm as default, but compiled with AVX2 target for
auto-vectorization.
*/

> 11) Function pg_checksum_block_simple isn't used at all.
>

There's no pg_checksum_block_simple function in v8. The implementation
only has the necessary functions: pg_checksum_block_default,
pg_checksum_block_avx2, and pg_checksum_block_choose.

> 12) Why do you need those?
> +#ifndef PG_CHECKSUM_EXTERNAL_INTERFACE
> +#define PG_CHECKSUM_EXTERNAL_INTERFACE
>

These macros are not present in v8. The implementation is cleaner
without unnecessary preprocessor guards.

> 13) Object files are added according to alphabetical order, not logical
> order (src/port/Makefile)
> pg_popcount_aarch64.o \
> pg_popcount_avx512.o \
> + checksum.o \
> pg_strong_random.o \
> pgcheckdir.o \
>

In v8, checksum.o is correctly placed in alphabetical order in the
OBJS list in src/port/Makefile:

OBJS = \
$(LIBOBJS) \
$(PG_CRC32C_OBJS) \
bsearch_arg.o \
checksum.o \
chklocale.o \

> 14) I still think that src/port/checksum.c needs to just include
> src/include/port/checksum_impl.h and have no other logic to keep
> checksum_impl.h's role as "header with full implementation"
> Now checksum_impl.h doesn't have any mention of pg_checksum_page
>

The current v8 approach has checksum.c simply include
checksum_impl.h, which maintains the "header with full implementation"
pattern you prefer. However, the function pointer mechanism and
runtime detection logic is in checksum_impl.h, which means
pg_checksum_page (the external interface) is also defined there. This
keeps the external interface clean while maintaining the
implementation details in the header.

> 15) Assembly for pg_checksum_block_choose now has full code of
> pg_checksum_block_default. This is probably a result of using inline
> functions
> Don't know if this is bad, but it is at least strange
>

I think that is expected behavior with the function pointer approach.
The compiler inlines the first call, but subsequent calls use the
cached function pointer, which is the standard PostgreSQL pattern for
runtime CPU feature detection (see CRC32C implementation).

> Also, some CFBot checks have failed. Two of them with this error/warning
> checksum.c:88:1: error: no previous prototype for ‘pg_checksum_page’
> [-Werror=missing-prototypes]
> 88 | pg_checksum_page(char *page, BlockNumber blkno)
> | ^~~~~~~~~~~~~~~~
> Please, address those
>

In v8, pg_checksum_page is declared in src/include/port/checksum.h,
which is included by checksum.c. This should resolve the missing
prototype error.

> Oleg Tselebrovskiy, PostgresPro

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-10-24 08:38:37 Re: Making pg_rewind faster
Previous Message Michael Paquier 2025-10-24 07:56:33 Re: Making pg_rewind faster