Re: add AVX2 support to simd.h

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: John Naylor <johncnaylorls(at)gmail(dot)com>
Cc: Ants Aasma <ants(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: add AVX2 support to simd.h
Date: 2024-03-15 17:41:49
Message-ID: 20240315174149.GA3837823@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 10, 2024 at 09:06:08AM +0700, John Naylor wrote:
> If we have say 25 elements, I mean (for SSE2) check the first 16, then
> the last 16. Some will be checked twice, but that's okay.

I finally got around to trying this. 0001 adds this overlapping logic.
0002 is a rebased version of the AVX2 patch (it needed some updates after
commit 9f225e9). And 0003 is a benchmark for test_lfind32(). It runs
pg_lfind32() on an array of the given size 100M times.

I've also attached the results of running this benchmark on my machine at
HEAD, after applying 0001, and after applying both 0001 and 0002. 0001
appears to work pretty well. When there is a small "tail," it regresses a
small amount, but overall, it seems to improve more cases than it harms.
0002 does regress searches on smaller arrays quite a bit, since it
postpones the SIMD optimizations until the arrays are longer. It might be
possible to mitigate by using 2 registers when the "tail" is long enough,
but I have yet to try that.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v2-0001-pg_lfind32-process-tail-with-SIMD-intructions.patch text/x-diff 1.5 KB
v2-0002-add-avx2-support-in-simd.h.patch text/x-diff 4.7 KB
v2-0003-test_lfind32-benchmark.patch text/x-diff 3.7 KB
image/jpeg 28.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amonson, Paul D 2024-03-15 17:43:39 RE: Popcount optimization using AVX512
Previous Message Jeff Davis 2024-03-15 17:14:15 Re: MERGE ... RETURNING