Quick Links

Re: add AVX2 support to simd.h

From:	John Naylor <johncnaylorls(at)gmail(dot)com>
To:	Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc:	Ants Aasma <ants(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: add AVX2 support to simd.h
Date:	2024-01-09 02:20:09
Message-ID:	CANWCAZYsnwxT2YjUQBWcc2QqOoE6xyEtTkqD6kw2pKCX1xevrg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Jan 9, 2024 at 12:37 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> > I suspect that there could be a regression lurking for some inputs
> > that the benchmark doesn't look at: pg_lfind32() currently needs to be
> > able to read 4 vector registers worth of elements before taking the
> > fast path. There is then a tail of up to 15 elements that are now
> > checked one-by-one, but AVX2 would increase that to 31. That's getting
> > big enough to be noticeable, I suspect. It would be good to understand
> > that case (n*32 + 31), because it may also be relevant now. It's also
> > easy to improve for SSE2/NEON for v17.
>
> Good idea. If it is indeed noticeable, we might be able to "fix" it by
> processing some of the tail with shorter vectors. But that probably means
> finding a way to support multiple vector sizes on the same build, which
> would require some work.

What I had in mind was an overlapping pattern I've seen in various
places: do one iteration at the beginning, then subtract the
aligned-down length from the end and do all those iterations. And
one-by-one is only used if the total length is small.

In response to

Re: add AVX2 support to simd.h at 2024-01-08 17:37:15 from Nathan Bossart

Responses

Re: add AVX2 support to simd.h at 2024-01-09 16:20:09 from Nathan Bossart

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Masahiko Sawada	2024-01-09 02:40:03	Re: [PoC] Improve dead tuple storage for lazy vacuum
Previous Message	Andy Fan	2024-01-09 02:01:59	Re: the s_lock_stuck on perform_spin_delay