RE: Popcount optimization using AVX512

From: "Amonson, Paul D" <paul(dot)d(dot)amonson(at)intel(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Matthias van de Meent" <boekewurm+postgres(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Popcount optimization using AVX512
Date: 2024-03-04 21:39:36
Message-ID: BL1PR11MB5304BFE26BAC25624508CFD2DC232@BL1PR11MB5304.namprd11.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

First, apologies on the patch. Find re-attached updated version.

Now I have some questions....
#1

> -#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
> +#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31)
> +<< 31))
>
> IME this means that the autoconf you are using has been patched. A quick search on the mailing lists seems to indicate that it might be specific to Debian [1].

I am not sure what the ask is here? I made changes to the configure.ac and ran autoconf2.69 to get builds to succeed. Do you have a separate feedback here?

#2
As for the refactoring, this was done to satisfy previous review feedback about applying the AVX512 CFLAGS to the entire pg_bitutils.c file. Mainly to avoid segfault due to the AVX512 flags. If its ok, I would prefer to make a single commit as the change is pretty small and straight forward.

#3
I am not sure I understand the comment about the SIZE_VOID_P checks. Aren't they necessary to choose which functions to call based on 32 or 64 bit architectures?

#4
Would this change qualify for Workflow A as described in [0] and can be picked up by a committer, given it has been reviewed by multiple committers so far? The scope of the change is pretty contained as well.

[0] https://wiki.postgresql.org/wiki/Submitting_a_Patch

Thanks,
Paul

-----Original Message-----
From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Sent: Friday, March 1, 2024 1:45 PM
To: Amonson, Paul D <paul(dot)d(dot)amonson(at)intel(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>; Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>; Shankaran, Akash <akash(dot)shankaran(at)intel(dot)com>; Noah Misch <noah(at)leadboat(dot)com>; Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>; Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>; pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Popcount optimization using AVX512

Thanks for the new version of the patch. I didn't see a commitfest entry for this one, and unfortunately I think it's too late to add it for the March commitfest. I would encourage you to add it to July's commitfest [0] so that we can get some routine cfbot coverage.

On Tue, Feb 27, 2024 at 08:46:06PM +0000, Amonson, Paul D wrote:
> After consulting some Intel internal experts on MSVC the linking issue
> as it stood was not resolved. Instead, I created a MSVC ONLY work-around.
> This adds one extra functional call on the Windows builds (The linker
> resolves a real function just fine but not a function pointer of the
> same name). This extra latency does not exist on any of the other
> platforms. I also believe I addressed all issues raised in the
> previous reviews. The new pg_popcnt_x86_64_accel.c file is now the
> ONLY file compiled with the
> AVX512 compiler flags. I added support for the MSVC compiler flag as
> well. Both meson and autoconf are updated with the new refactor.
>
> I am attaching the new patch.

I think this patch might be missing the new files.

-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31)
+<< 31))

IME this means that the autoconf you are using has been patched. A quick search on the mailing lists seems to indicate that it might be specific to Debian [1].

-static int pg_popcount32_slow(uint32 word);
-static int pg_popcount64_slow(uint64 word);
+int pg_popcount32_slow(uint32 word);
+int pg_popcount64_slow(uint64 word);
+uint64 pg_popcount_slow(const char *buf, int bytes);

This patch appears to do a lot of refactoring. Would it be possible to break out the refactoring parts into a prerequisite patch that could be reviewed and committed independently from the AVX512 stuff?

-#if SIZEOF_VOID_P >= 8
+#if SIZEOF_VOID_P == 8
/* Process in 64-bit chunks if the buffer is aligned. */
- if (buf == (const char *) TYPEALIGN(8, buf))
+ if (buf == (const char *)TYPEALIGN(8, buf))
{
- const uint64 *words = (const uint64 *) buf;
+ const uint64 *words = (const uint64 *)buf;

while (bytes >= 8)
{
@@ -309,9 +213,9 @@ pg_popcount(const char *buf, int bytes)
bytes -= 8;
}

- buf = (const char *) words;
+ buf = (const char *)words;
}
-#else
+#elif SIZEOF_VOID_P == 4
/* Process in 32-bit chunks if the buffer is aligned. */
if (buf == (const char *) TYPEALIGN(4, buf))
{

Most, if not all, of these changes seem extraneous. Do we actually need to more strictly check SIZEOF_VOID_P?

[0] https://commitfest.postgresql.org/48/
[1] https://postgr.es/m/20230211020042.uthdgj72kp3xlqam%40awork3.anarazel.de

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v5-0001-Add-support-for-AVX512-implemented-POPCNT.patch application/octet-stream 31.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-03-04 21:50:48 Re: Improve the log message output of basic_archive when basic_archive.archive_directory parameter is not set
Previous Message Daniel Gustafsson 2024-03-04 21:03:13 Re: Adding deprecation notices to pgcrypto documentation