Re: use ARM intrinsics in pg_lfind32() where available

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: use ARM intrinsics in pg_lfind32() where available
Date: 2022-08-27 22:12:34
Message-ID: 20220827221234.GA15951@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for taking a look.

On Sat, Aug 27, 2022 at 01:59:06PM +0700, John Naylor wrote:
> I don't forsee any use of emulating vector registers with uint64 if
> they only hold two ints. I wonder if it'd be better if all vector32
> functions were guarded with #ifndef NO_USE_SIMD. (I wonder if
> declarations without definitions cause warnings...)

Yeah. I was a bit worried about the readability of this file with so many
#ifndefs, but after trying it out, I suppose it doesn't look _too_ bad.

> + * NB: This function assumes that each lane in the given vector either has all
> + * bits set or all bits zeroed, as it is mainly intended for use with
> + * operations that produce such vectors (e.g., vector32_eq()). If this
> + * assumption is not true, this function's behavior is undefined.
> + */
>
> Hmm?

Yup. The problem is that AFAICT there's no equivalent to
_mm_movemask_epi8() on aarch64, so you end up with something like

vmaxvq_u8(vandq_u8(v, vector8_broadcast(0x80))) != 0

But for pg_lfind32(), we really just want to know if any lane is set, which
only requires a call to vmaxvq_u32(). I haven't had a chance to look too
closely, but my guess is that this ultimately results in an extra AND
operation in the aarch64 path, so maybe it doesn't impact performance too
much. The other option would be to open-code the intrinsic function calls
into pg_lfind.h. I'm trying to avoid the latter, but maybe it's the right
thing to do for now... What do you think?

> -#elif defined(USE_SSE2)
> +#elif defined(USE_SSE2) || defined(USE_NEON)
>
> I think we can just say #else.

Yes.

> -#if defined(USE_SSE2)
> - __m128i sub;
> +#ifndef USE_NO_SIMD
> + Vector8 sub;
>
> +#elif defined(USE_NEON)
> +
> + /* use the same approach as the USE_SSE2 block above */
> + sub = vqsubq_u8(v, vector8_broadcast(c));
> + result = vector8_has_zero(sub);
>
> I think we should invent a helper that does saturating subtraction and
> call that, inlining the sub var so we don't need to mess with it
> further.

Good idea, will do.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2022-08-27 22:15:02 Re: use ARM intrinsics in pg_lfind32() where available
Previous Message Tom Lane 2022-08-27 21:18:34 Re: use ARM intrinsics in pg_lfind32() where available