Re: add AVX2 support to simd.h

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: John Naylor <johncnaylorls(at)gmail(dot)com>
Cc: Ants Aasma <ants(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: add AVX2 support to simd.h
Date: 2024-03-21 17:09:44
Message-ID: 20240321170944.GA1767527@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 21, 2024 at 11:30:30AM +0700, John Naylor wrote:
> I'm much happier about v5-0001. With a small tweak it would match what
> I had in mind:
>
> + if (nelem < nelem_per_iteration)
> + goto one_by_one;
>
> If this were "<=" then the for long arrays we could assume there is
> always more than one block, and wouldn't need to check if any elements
> remain -- first block, then a single loop and it's done.
>
> The loop could also then be a "do while" since it doesn't have to
> check the exit condition up front.

Good idea. That causes us to re-check all of the tail elements when the
number of elements is evenly divisible by nelem_per_iteration, but that
might be worth the trade-off.

> Yes, that spike is weird, because it seems super-linear. However, the
> more interesting question for me is: AVX2 isn't really buying much for
> the numbers covered in this test. Between 32 and 48 elements, and
> between 64 and 80, it's indistinguishable from SSE2. The jumps to the
> next shelf are postponed, but the jumps are just as high. From earlier
> system benchmarks, I recall it eventually wins out with hundreds of
> elements, right? Is that still true?

It does still eventually win, although not nearly to the same extent as
before. I extended the benchmark a bit to show this. I wouldn't be
devastated if we only got 0001 committed for v17, given these results.

> Further, now that the algorithm is more SIMD-appropriate, I wonder
> what doing 4 registers at a time is actually buying us for either SSE2
> or AVX2. It might just be a matter of scale, but that would be good to
> understand.

I'll follow up with these numbers shortly.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v6-0001-pg_lfind32-add-overlap-code-for-remaining-element.patch text/x-diff 3.8 KB
v6-0002-Add-support-for-AVX2-in-simd.h.patch text/x-diff 4.8 KB
image/jpeg 23.3 KB
image/jpeg 20.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-03-21 17:12:22 Re: add AVX2 support to simd.h
Previous Message Robert Haas 2024-03-21 17:09:24 Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers