|Nathan Bossart <nathandbossart(at)gmail(dot)com>
|autovectorize page checksum code included elsewhere
|Raw Message | Whole Thread | Download mbox | Resend email
(Unfortunately, I'm posting this too late for the November commitfest, but
I'm hoping this will be the first in a series of proposed improvements
involving SIMD instructions for v17.)
Presently, we ask compilers to autovectorize checksum.c and numeric.c. The
page checksum code actually lives in checksum_impl.h, and checksum.c just
includes it. But checksum_impl.h is also used in pg_upgrade/file.c and
pg_checksums.c, and since we don't ask compilers to autovectorize those
files, the page checksum code may remain un-vectorized.
The attached patch is a quick attempt at adding CFLAGS_UNROLL_LOOPS and
CFLAGS_VECTORIZE to the CFLAGS for the aforementioned objects. The gains
are modest (i.e., some system CPU and/or a few percentage points on the
total time), but it seemed like a no-brainer.
Separately, I'm wondering whether we should consider using CFLAGS_VECTORIZE
on the whole tree. Commit fdea253 seems to be responsible for introducing
this targeted autovectorization strategy, and AFAICT this was just done to
minimize the impact elsewhere while optimizing page checksums. Are there
fundamental problems with adding CFLAGS_VECTORIZE everywhere? Or is it
just waiting on someone to do the analysis/benchmarking?
Amazon Web Services: https://aws.amazon.com
|Re: 2023-11-09 release announcement draft
|Re: A recent message added to pg_upgade