Quick Links

Re: use ARM intrinsics in pg_lfind32() where available

From:	John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To:	Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: use ARM intrinsics in pg_lfind32() where available
Date:	2022-08-29 04:25:50
Message-ID:	CAFBsxsEyR9JkfbPcDXBRYEfdfC__OkwVGdwEAgY4Rv0cvw35EA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sun, Aug 28, 2022 at 10:58 AM Nathan Bossart
<nathandbossart(at)gmail(dot)com> wrote:
>
> Here is a new patch set in which I've attempted to address all feedback.

Looks in pretty good shape. Some more comments:

+ uint32 nelem_per_vector = sizeof(Vector32) / sizeof(uint32);
+ uint32 nelem_per_iteration = 4 * nelem_per_vector;

Using local #defines would be my style. I don't have a reason to
object to this way, but adding const makes these vars more clear.
Speaking of const:

- const __m128i tmp1 = _mm_or_si128(result1, result2);
- const __m128i tmp2 = _mm_or_si128(result3, result4);
- const __m128i result = _mm_or_si128(tmp1, tmp2);
+ tmp1 = vector32_or(result1, result2);
+ tmp2 = vector32_or(result3, result4);
+ result = vector32_or(tmp1, tmp2);

Any reason to throw away the const declarations?

+static inline bool
+vector32_is_highbit_set(const Vector32 v)
+{
+#ifdef USE_SSE2
+ return (_mm_movemask_epi8(v) & 0x8888) != 0;
+#endif
+}

I'm not sure why we need this function -- AFAICS it just adds more
work on x86 for zero benefit. For our present application, can we just
cast to Vector8 (for Arm's sake) and call the 8-bit version?

Aside from that, I plan on rewriting some comments for commit, some of
which pre-date this patch:

- * operations using bitwise operations on unsigned integers.
+ * operations using bitwise operations on unsigned integers. Note that many
+ * of the functions in this file presently do not have non-SIMD
+ * implementations.

It's unclear to the reader whether this is a matter of 'round-to-it's.
I'd like to document what I asserted in this thread, that it's likely
not worthwhile to do anything with a uint64 representing two 32-bit
ints. (It *is* demonstrably worth it for handling 8 byte-values at a
time)

* Use saturating subtraction to find bytes <= c, which will present as
- * NUL bytes in 'sub'.
+ * NUL bytes.

I'd like to to point out that the reason to do it this way is to
workaround SIMD architectures frequent lack of unsigned comparison.

+ * Return the result of subtracting the respective elements of the input
+ * vectors using saturation.

I wonder if we should explain briefly what saturating arithmetic is. I
had never encountered it outside of a SIMD programming context.

--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Re: use ARM intrinsics in pg_lfind32() where available at 2022-08-28 03:58:39 from Nathan Bossart

Responses

Re: use ARM intrinsics in pg_lfind32() where available at 2022-08-29 04:28:24 from Tom Lane
Re: use ARM intrinsics in pg_lfind32() where available at 2022-08-29 05:44:49 from Nathan Bossart
Re: use ARM intrinsics in pg_lfind32() where available at 2022-08-29 08:19:22 from John Naylor

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2022-08-29 04:28:24	Re: use ARM intrinsics in pg_lfind32() where available
Previous Message	Kyotaro Horiguchi	2022-08-29 04:21:17	Re: standby promotion can create unreadable WAL