Re: Popcount optimization using AVX512

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: "Amonson, Paul D" <paul(dot)d(dot)amonson(at)intel(dot)com>
Cc: "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Popcount optimization using AVX512
Date: 2024-02-07 10:13:14
Message-ID: 202402071013.reqe3bdnjjuj@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

This looks quite reasonable. On my machine, I get the compiler test to
pass so I get a "yes" in configure; but of course my CPU doesn't support
the instructions so I get the slow variant. So here's the patch again
with some minor artifacts fixed.

I have the following review notes:

1. we use __get_cpuid_count and __cpuidex by relying on macros
HAVE__GET_CPUID and HAVE__CPUID respectively; but those macros are (in
the current Postgres source) only used and tested for __get_cpuid and
__cpuid respectively. So unless there's some reason to be certain that
__get_cpuid_count is always present when __get_cpuid is present, and
that __cpuidex is present when __cpuid is present, I think we need to
add new configure tests and new HAVE_ macros for these.

2. we rely on <immintrin.h> being present with no AC_CHECK_HEADER()
test. We currently don't use this header anywhere, so I suppose we need
a test for this one as well. (Also, I suppose if we don't have
immintrin.h we can skip the rest of it?)

3. We do the __get_cpuid_count/__cpuidex test and we also do a xgetbv
test. The comment there claims that this is to check the results for
consistency. But ... how would we know that the results are ever
inconsistent? As far as I understand, if they were, we would silently
become slower. Is this really what we want? I'm confused about this
coding. Maybe we do need both tests to succeed? In that case, just
reword the comment.

I think if both tests are each considered reliable on its own, then we
could either choose one of them and stick with it, ignoring the other;
or we could use one as primary and then in a USE_ASSERT_CHECKING block
verify that the other matches and throw a WARNING if not (but what would
that tell us?). Or something like that ... not sure.

4. It needs meson support, which I suppose consists of copying the
c-compiler.m4 test into meson.build, mimicking what the tests for CRC
instructions do.

I started a CI run with this patch applied,
https://cirrus-ci.com/build/4912499619790848
but because Meson support is missing, the compile failed
immediately:

[10:08:48.825] ccache cc -Isrc/port/libpgport_srv.a.p -Isrc/include -I../src/include -Isrc/include/utils -fdiagnostics-color=always -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -g -fno-strict-aliasing -fwrapv -fexcess-precision=standard -D_GNU_SOURCE -Wmissing-prototypes -Wpointer-arith -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security -Wdeclaration-after-statement -Wno-format-truncation -Wno-stringop-truncation -fPIC -pthread -DBUILDING_DLL -MD -MQ src/port/libpgport_srv.a.p/pg_bitutils.c.o -MF src/port/libpgport_srv.a.p/pg_bitutils.c.o.d -o src/port/libpgport_srv.a.p/pg_bitutils.c.o -c ../src/port/pg_bitutils.c
[10:08:48.825] ../src/port/pg_bitutils.c: In function ‘pg_popcount512_fast’:
[10:08:48.825] ../src/port/pg_bitutils.c:270:11: warning: AVX512F vector return without AVX512F enabled changes the ABI [-Wpsabi]
[10:08:48.825] 270 | __m512i accumulator = _mm512_setzero_si512();
[10:08:48.825] | ^~~~~~~~~~~
[10:08:48.825] In file included from /usr/lib/gcc/x86_64-linux-gnu/10/include/immintrin.h:55,
[10:08:48.825] from ../src/port/pg_bitutils.c:22:
[10:08:48.825] /usr/lib/gcc/x86_64-linux-gnu/10/include/avx512fintrin.h:339:1: error: inlining failed in call to ‘always_inline’ ‘_mm512_setzero_si512’: target specific option mismatch
[10:08:48.825] 339 | _mm512_setzero_si512 (void)
[10:08:48.825] | ^~~~~~~~~~~~~~~~~~~~
[10:08:48.825] ../src/port/pg_bitutils.c:270:25: note: called from here
[10:08:48.825] 270 | __m512i accumulator = _mm512_setzero_si512();
[10:08:48.825] | ^~~~~~~~~~~~~~~~~~~~~~

Thanks

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Siempre hay que alimentar a los dioses, aunque la tierra esté seca" (Orual)

Attachment Content-Type Size
v3-0001-Add-support-for-AVX512-implemented-POPCNT.patch text/x-diff 12.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2024-02-07 10:13:24 Re: pgsql: Add EXPLAIN (MEMORY) to report planner memory consumption
Previous Message Peter Eisentraut 2024-02-07 09:58:40 Re: Postgres and --config-file option