Re: Popcount optimization using AVX512

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Ants Aasma <ants(dot)aasma(at)cybertec(dot)at>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Amonson, Paul D" <paul(dot)d(dot)amonson(at)intel(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Popcount optimization using AVX512
Date: 2024-04-06 19:41:01
Message-ID: 20240406194101.GA533391@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 06, 2024 at 02:51:39PM +1300, David Rowley wrote:
> On Sat, 6 Apr 2024 at 14:17, Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>> On Sat, Apr 06, 2024 at 12:08:14PM +1300, David Rowley wrote:
>> > Won't Valgrind complain about this?
>> >
>> > +pg_popcount_avx512(const char *buf, int bytes)
>> >
>> > + buf = (const char *) TYPEALIGN_DOWN(sizeof(__m512i), buf);
>> >
>> > + val = _mm512_maskz_loadu_epi8(mask, (const __m512i *) buf);
>>
>> I haven't been able to generate any complaints, at least with some simple
>> tests. But I see your point. If this did cause such complaints, ISTM we'd
>> just want to add it to the suppression file. Otherwise, I think we'd have
>> to go back to the non-maskz approach (which I really wanted to avoid
>> because of the weird function overhead juggling) or find another way to do
>> a partial load into an __m512i.
>
> [1] seems to think it's ok. If this is true then the following
> shouldn't segfault:
>
> The following seems to run without any issue and if I change the mask
> to 1 it crashes, as you'd expect.

Cool.

Here is what I have staged for commit, which I intend to do shortly. At
some point, I'd like to revisit converting TRY_POPCNT_FAST to a
configure-time check and maybe even moving the "fast" and "slow"
implementations to their own files, but since that's mostly for code
neatness and we are rapidly approaching the v17 deadline, I'm content to
leave that for v18.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v28-0001-Optimize-pg_popcount-with-AVX-512-instructions.patch text/x-diff 30.9 KB
v28-0002-Optimize-visibilitymap_count-with-AVX-512-instru.patch text/x-diff 12.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2024-04-06 20:21:27 Re: Flushing large data immediately in pqcomm
Previous Message Michal Bartak 2024-04-06 18:14:35 CASE control block broken by a single line comment