Re: Popcount optimization using AVX512

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: "Amonson, Paul D" <paul(dot)d(dot)amonson(at)intel(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Popcount optimization using AVX512
Date: 2024-03-18 21:02:18
Message-ID: CAApHDvqyMNGVgwpaOPtENdq5uEMR=vSkRJEgG1S+X7Vtk1-EnA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 19 Mar 2024 at 06:30, Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
> Here is a more fleshed-out version of what I believe David is proposing.
> On my machine, the gains aren't quite as impressive (~8.8s to ~5.2s for the
> test_popcount benchmark). I assume this is because this patch turns
> pg_popcount() into a function pointer, which is what the AVX512 patches do,
> too. I left out the 32-bit section from pg_popcount_fast(), but I'll admit
> that I'm not yet 100% sure that we can assume we're on a 64-bit system
> there.

I looked at your latest patch and tried out the performance on a Zen4
running windows and a Zen2 running on Linux. As follows:

AMD 3990x:

master:
postgres=# select drive_popcount(10000000, 1024);
Time: 11904.078 ms (00:11.904)
Time: 11907.176 ms (00:11.907)
Time: 11927.983 ms (00:11.928)

patched:
postgres=# select drive_popcount(10000000, 1024);
Time: 3641.271 ms (00:03.641)
Time: 3610.934 ms (00:03.611)
Time: 3663.423 ms (00:03.663)

AMD 7945HX Windows

master:
postgres=# select drive_popcount(10000000, 1024);
Time: 9832.845 ms (00:09.833)
Time: 9844.460 ms (00:09.844)
Time: 9858.608 ms (00:09.859)

patched:
postgres=# select drive_popcount(10000000, 1024);
Time: 3427.942 ms (00:03.428)
Time: 3364.262 ms (00:03.364)
Time: 3413.407 ms (00:03.413)

The only thing I'd question in the patch is in pg_popcount_fast(). It
looks like you've opted to not do the 32-bit processing on 32-bit
machines. I think that's likely still worth coding in a similar way to
how pg_popcount_slow() works. i.e. use "#if SIZEOF_VOID_P >= 8".
Probably one day we'll remove that code, but it seems strange to have
pg_popcount_slow() do it and not pg_popcount_fast().

> IMHO this work is arguably a prerequisite for the AVX512 work, as turning
> pg_popcount() into a function pointer will likely regress performance for
> folks on systems without AVX512 otherwise.

I think so too.

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-03-18 21:08:10 Re: Popcount optimization using AVX512
Previous Message Robert Haas 2024-03-18 20:59:37 Re: Possibility to disable `ALTER SYSTEM`