Re: Popcount optimization for the slow-path lookups

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Andrew Pogrebnoi <andrew(dot)pogrebnoi(at)percona(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Popcount optimization for the slow-path lookups
Date: 2025-12-05 15:39:51
Message-ID: aTL8x7hylAkptvBi@nathan
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 05, 2025 at 03:07:07PM +0200, Andrew Pogrebnoi wrote:
> I want to propose an optimization for pg_popcount32_slow() and
> pg_popcount64_slow() where lookups into pg_number_of_ones[] are made
> branchless. It shows speedup around 58% for uint64 and 35% for uint32 words
> compared to the current, looped version. This is on x86. It is much more
> significant on Arm64 (Apple M1): ~x2.6 for uint64 and ~x2.25 for uint32
> words.
>
> I probably have to guard the lookup in pg_popcount64_slow() with `#if
> SIZEOF_LONG == 8` like it's done for __builtin_popcountl() in the same
> function. What do you think?
>
> For benchmarks, I used functions with the copies of the lookup loop from
> pg_popcount32/64_slow(), measuring them against branchless counterparts.
> Then in C++ I pre generated two static vectors with random uint64's and
> uint32's, each of 5M size and ran lookups using google/benchmark [0]. I've
> run it on M1 and x86 Macs.

I don't think the proposed improvements are relevant for either of the
machines you used for your benchmarks. For x86, we've optimized our
popcount code to use SSE4.2 or AVX-512, and for AArch64, we've optimized it
to use Neon or SVE. And for other systems, we still try to use
__builtin_popcount() and friends in the fallback paths, which IIUC are
available on both gcc and clang (and maybe elsewhere). IMHO we need to run
the benchmarks on a compiler/architecture combination where it would
actually be used in practice.

--
nathan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2025-12-05 15:50:24 Re: making tid and HOTness of UPDATE available to logical decoding plugins
Previous Message Peter Eisentraut 2025-12-05 15:33:22 commented out code