Re: Popcount optimization using AVX512

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Ants Aasma <ants(dot)aasma(at)cybertec(dot)at>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Amonson, Paul D" <paul(dot)d(dot)amonson(at)intel(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Popcount optimization using AVX512
Date: 2024-04-02 15:53:01
Message-ID: 20240402155301.GA2750455@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 01, 2024 at 05:11:17PM -0500, Nathan Bossart wrote:
> Here is a v19 of the patch set. I moved out the refactoring of the
> function pointer selection code to 0001. I think this is a good change
> independent of $SUBJECT, and I plan to commit this soon. In 0002, I
> changed the syslogger.c usage of pg_popcount() to use pg_number_of_ones
> instead. This is standard practice elsewhere where the popcount functions
> are unlikely to win. I'll probably commit this one soon, too, as it's even
> more trivial than 0001.
>
> 0003 is the AVX512 POPCNT patch. Besides refactoring out 0001, there are
> no changes from v18. 0004 is an early proof-of-concept for using AVX512
> for the visibility map code. The code is missing comments, and I haven't
> performed any benchmarking yet, but I figured I'd post it because it
> demonstrates how it's possible to build upon 0003 in other areas.

I've committed the first two patches, and I've attached a rebased version
of the latter two.

> AFAICT the main open question is the function call overhead in 0003 that
> Alvaro brought up earlier. After 0002 is committed, I believe the only
> in-tree caller of pg_popcount() with very few bytes is bit_count(), and I'm
> not sure it's worth expending too much energy to make sure there are
> absolutely no regressions there. However, I'm happy to do so if folks feel
> that it is necessary, and I'd be grateful for thoughts on how to proceed on
> this one.

Another idea I had is to turn pg_popcount() into a macro that just uses the
pg_number_of_ones array when called for few bytes:

static inline uint64
pg_popcount_inline(const char *buf, int bytes)
{
uint64 popcnt = 0;

while (bytes--)
popcnt += pg_number_of_ones[(unsigned char) *buf++];

return popcnt;
}

#define pg_popcount(buf, bytes) \
((bytes < 64) ? \
pg_popcount_inline(buf, bytes) : \
pg_popcount_optimized(buf, bytes))

But again, I'm not sure this is really worth it for the current use-cases.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v20-0001-AVX512-popcount-support.patch text/x-diff 28.7 KB
v20-0002-optimize-visibilitymap_count-with-AVX512.patch text/x-diff 9.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-04-02 15:54:24 Re: On disable_cost
Previous Message Tom Lane 2024-04-02 15:47:28 Re: Detoasting optionally to make Explain-Analyze less misleading