| From: | John Naylor <johncnaylorls(at)gmail(dot)com> |
|---|---|
| To: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
| Cc: | Greg Burd <greg(at)burd(dot)me>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: refactor architecture-specific popcount code |
| Date: | 2026-02-04 05:13:35 |
| Message-ID: | CANWCAZa_d9yZ-V7eU57M9UzB=BY+BUFDtdiF79bY5oyWYYSYLA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Feb 4, 2026 at 3:44 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> Okay, here is an updated patch set with first drafts of the commit
> messages. I'm reasonably happy with these patches, but I'll admit my
> justification for ripping out the 32-bit optimizations feels a bit flimsy.
> I don't get the idea that we are all that concerned about things like
> micro-regressions for popcount on 32-bit builds, but OTOH it isn't hard to
> imagine someone objecting to these changes.
I actually believe the changes improve even on 32-bit, at least for
bitmapsets. The previous 32-bit optimizations are just mirrors of the
64-bit ones, which we found weren't always great to begin with, so I
don't buy that we're losing much here.
> I was going to run it on machines with SVE/AVX-512, but John already tested
> the AVX-512 case [0], and I have no reason to believe that we'll see
> regressions on machines with SVE.
Actually, that was the SSE4.2 case. Sorry if I wasn't clear.
For v9:
0001:
- /*
- * We set the threshold to the point at which we'll first use special
- * instructions in the optimized version.
- */
-#if SIZEOF_VOID_P >= 8
- int threshold = 8;
-#else
- int threshold = 4;
-#endif
-
- if (bytes < threshold)
+ if (bytes < 8)
It seems like we should still have some kind of comment here. Even
just to say the 8 value was found through testing. (It was some time
ago, right?)
0002:
+static inline int
+pg_popcount32(uint32 word)
+{
+ /*
+ * On x86, gcc generates a function call for this built-in unless the
+ * popcnt instruction is available, so we use the plain C version in that
+ * case to ensure inlining.
+ */
+#if defined(HAVE__BUILTIN_POPCOUNT) && (defined(__POPCNT__) ||
!defined(__x86_64__))
+ return __builtin_popcount(word);
It was intentional that my PoC pg_popcount32 was simple, pure C and
didn't have #ifdefs anymore, and I'd prefer to go back to that. This
function is now only used for single-word bitmapsets on 32-bit
platforms and in an assert, and even if that changes we've shown that
inlining bitwise ops for a single word is already pretty good for
performance. Plus, doesn't this cause gcc generate a function call on
32-bit x86 because "!defined(__x86_64__)" is true? That defeats the
whole purpose of inlining in the first place. Simple is good here.
+#elif defined(_MSC_VER)
+ return __popcnt64(word);
The commit message says "converts the portable ones to inlined
functions", but this was copied from a architecture specific file with
a runtime check. I've seen an assertion in this thread that the
hardware instruction is required for some Windows version, but it
would be nice to have a link to documentation for the archives. More
worryingly, this is almost certainly broken on 32-bit, and the
buildfarm won't tell us -- please see commit
53ea2b7ad050ce4ad95c89bb55197209b65886a1 and bug report that led to
it. Seems like material for a separate commit.
0003:
- for (int i = 0; i < RT_BM_IDX(RT_NODE_MAX_SLOTS); i++)
- cnt += bmw_popcount(n256->isset[i]);
+ cnt += pg_popcount((const char *) n256->isset,
+ RT_NODE_MAX_SLOTS / BITS_PER_BYTE);
This can now be "cnt =". The initialization to zero is now
unnecessary, but it's also harmless.
LGTM otherwise.
"perhaps-over-optimized" -- In more neutral language, they're better
optimized for longer inputs. Hence the need to shortcut for the common
case.
"This fast-path could arguably go in pg_popcount() itself (with an
appropriate alignment check), but that is left as a future exercise."
I wondered about that too, but that adds branches and code everywhere
it's called.
--
John Naylor
Amazon Web Services
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Corey Huinker | 2026-02-04 05:14:06 | Re: Add expressions to pg_restore_extended_stats() |
| Previous Message | Yugo Nagata | 2026-02-04 05:07:31 | Warn when creating or enabling a subscription with max_logical_replication_workers = 0 |