Re: Popcount optimization using AVX512

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Ants Aasma <ants(dot)aasma(at)cybertec(dot)at>, "Amonson, Paul D" <paul(dot)d(dot)amonson(at)intel(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Popcount optimization using AVX512
Date: 2024-04-02 22:01:32
Message-ID: 20240402220132.GA3101368@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 02, 2024 at 01:40:21PM -0500, Nathan Bossart wrote:
> On Tue, Apr 02, 2024 at 01:43:48PM -0400, Tom Lane wrote:
>> I don't like the double evaluation of the macro argument. Seems like
>> you could get the same results more safely with
>>
>> static inline uint64
>> pg_popcount(const char *buf, int bytes)
>> {
>> if (bytes < 64)
>> {
>> uint64 popcnt = 0;
>>
>> while (bytes--)
>> popcnt += pg_number_of_ones[(unsigned char) *buf++];
>>
>> return popcnt;
>> }
>> return pg_popcount_optimized(buf, bytes);
>> }
>
> Yeah, I like that better. I'll do some testing to see what the threshold
> really should be before posting an actual patch.

My testing shows that inlining wins with fewer than 8 bytes for the current
"fast" implementation. The "fast" implementation wins with fewer than 64
bytes compared to the AVX-512 implementation. These results are pretty
intuitive because those are the points at which the optimizations kick in.

In v21, 0001 is just the above inlining idea, which seems worth doing
independent of $SUBJECT. 0002 and 0003 are the AVX-512 patches, which I've
modified similarly to 0001, i.e., I've inlined the "fast" version in the
function pointer to avoid the function call overhead when there are fewer
than 64 bytes. All of this overhead juggling should result in choosing the
optimal popcount implementation depending on how many bytes there are to
process, roughly speaking.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v21-0001-inline-pg_popcount-for-small-numbers-of-bytes.patch text/x-diff 3.6 KB
v21-0002-AVX512-popcount-support.patch text/x-diff 29.6 KB
v21-0003-optimize-visibilitymap_count-with-AVX512.patch text/x-diff 11.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Cramer 2024-04-02 22:12:51 building postgres on windows using meson libxml2 not found
Previous Message Jeff Davis 2024-04-02 21:59:12 Re: Statistics Import and Export