Re: Popcount optimization using AVX512

From: Noah Misch <noah(at)leadboat(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, "Amonson, Paul D" <paul(dot)d(dot)amonson(at)intel(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com>
Subject: Re: Popcount optimization using AVX512
Date: 2023-11-07 05:53:15
Message-ID: 20231107055315.8e@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 06, 2023 at 09:59:26PM -0600, Nathan Bossart wrote:
> On Mon, Nov 06, 2023 at 07:15:01PM -0800, Noah Misch wrote:
> > On Mon, Nov 06, 2023 at 09:52:58PM -0500, Tom Lane wrote:
> >> Nathan Bossart <nathandbossart(at)gmail(dot)com> writes:
> >> > Like I said, I don't have any proposals yet, but assuming we do want to
> >> > support newer intrinsics, either open-coded or via auto-vectorization, I
> >> > suspect we'll need to gather consensus for a new policy/strategy.
> >>
> >> Yeah. The function-pointer solution kind of sucks, because for the
> >> sort of operation we're considering here, adding a call and return
> >> is probably order-of-100% overhead. Worse, it adds similar overhead
> >> for everyone who doesn't get the benefit of the optimization.
> >
> > The glibc/gcc "ifunc" mechanism was designed to solve this problem of choosing
> > a function implementation based on the runtime CPU, without incurring function
> > pointer overhead. I would not attempt to use AVX512 on non-glibc systems, and
> > I would use ifunc to select the desired popcount implementation on glibc:
> > https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Function-Attributes.html
>
> Thanks, that seems promising for the function pointer cases. I'll plan on
> trying to convert one of the existing ones to use it. BTW it looks like
> LLVM has something similar [0].
>
> IIUC this unfortunately wouldn't help for cases where we wanted to keep
> stuff inlined, such as is_valid_ascii() and the functions in pg_lfind.h,
> unless we applied it to the calling functions, but that doesn't ѕound
> particularly maintainable.

Agreed, it doesn't solve inline cases. If the gains are big enough, we should
move toward packages containing N CPU-specialized copies of the postgres
binary, with bin/postgres just exec'ing the right one.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Morris 2023-11-07 06:53:09 Re: Where can I find the doxyfile?
Previous Message Kyotaro Horiguchi 2023-11-07 05:35:14 Re: Intermittent failure with t/003_logical_slots.pl test on windows