Re: refactor architecture-specific popcount code

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: refactor architecture-specific popcount code
Date: 2026-01-15 09:07:51
Message-ID: CANWCAZb0U2qDTHCxPDyZugOTSE7xPURWKwcf8d4JmYTgH4+nEA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 15, 2026 at 3:40 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
> Right now, the organization of this code is weird. All AArch64-specific
> implementations live in an AArch64-specific file, the AVX-512
> implementations live in their own file, and the architecture-agnostic and
> SSE4.2 implementations live in pg_bitutils.c. The attached patches move
> the SSE4.2 implementations to the AVX-512 file (which is renamed
> appropriately), and they update some function names to be more descriptive,
> i.e., "fast" is replaced with "sse42" and "slow" is replaced with
> "generic".

Thanks for taking on some technical debt!

0001

--- a/src/port/pg_popcount_avx512.c
+++ b/src/port/pg_popcount_x86_64.c

Can we get away with just "x86" for brevity? We generally don't target
32-bit CPUs for this kind of work, so there's no chance of confusion.

0002

```
-#ifdef USE_AVX512_POPCNT_WITH_RUNTIME_CHECK
+#include "port/pg_bitutils.h"
+
+#ifdef TRY_POPCNT_X86_64

#if defined(HAVE__GET_CPUID) || defined(HAVE__GET_CPUID_COUNT)
#include <cpuid.h>
#endif
```

With the above in the x86 .c file, I wonder we can get rid of this
stanza and the "try" symbol and gate only on HAVE_X86_64_POPCNTQ:

#ifdef HAVE_X86_64_POPCNTQ
#if defined(HAVE__GET_CPUID) || defined(HAVE__CPUID)
#define TRY_POPCNT_X86_64 1
#endif
#endif

If we have to be cautious, we could just turn the #error on no CPUID
symbol into "return false".

0003

s/fast/sse42/:

Seems okay in this file, but this isn't the best name, either. Maybe a
comment to head off future "corrections", something like:
"Technically, POPCNT is not part of SSE 4.2, and is not even a vector
operation, but many compilers emit the popcnt instruction with
-msse4.2 anyway."

s/slow/generic/:

I'm ambivalent about this. The "slow" designation is flat-out wrong
since at least Power and aarch64 can emit a single instruction here
without prodding the compiler. On the other hand, "generic" seems
wrong too, since e.g. pg_popcount64_slow() has three configure symbols
and two compiler builtins. :-D

A possible future project would be to have a truly generic simple
fallback in pure C and put all the fancy stuff in the header for
architectures that have unconditional hardware support. It would make
more sense to revisit the name then.

--
John Naylor
Amazon Web Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuro Yamada 2026-01-15 09:20:52 Re: [PATCH] psql: add \dcs to list all constraints
Previous Message Corey Huinker 2026-01-15 08:59:27 Re: Extended Statistics set/restore/clear functions.