Re: refactor architecture-specific popcount code

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: John Naylor <johncnaylorls(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: refactor architecture-specific popcount code
Date: 2026-02-02 22:51:54
Message-ID: aYEqini0ukxQv2_D@nathan
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 02, 2026 at 09:16:42PM +0700, John Naylor wrote:
> It might be a good idea to do a little new testing, and I see a use
> for a special 8-byte path independent of AVX512: v6 seems to regress a
> little for single-words. But, it turns out that when gcc turns
> __builtin_popcountl into a single instruction, it's inline, but if it
> emits portable bitwise ops, it does so in a function called
> __popcountdi2(). That can be avoided by hand-coding in C for normal
> builds (and for 32-bit looks cleaner anyway), as in the attached 0005.

Oh, interesting. I looked into this a little more [0]. Both gcc and clang
generate cnt instructions for aarch64, so we're good there. However, clang
on x86-64 generates the bit-twiddling version, and gcc on x86-64 generates
a call to __popcountdi2() (which I imagine does something similar). It's
not until you provide a compiler flag like -march=x86-64-v2 that gcc/clang
start generating popcnt instructions for x86-64, which makes sense. 0005
seems like the correct move to me...

[0] https://godbolt.org/z/he3WozG3E

--
nathan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Radim Marek 2026-02-02 22:54:24 Non-deterministic buffer counts reported in execution with EXPLAIN ANALYZE BUFFERS
Previous Message Melanie Plageman 2026-02-02 22:47:23 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)