| From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
|---|---|
| To: | Andrew Pogrebnoi <andrew(dot)pogrebnoi(at)percona(dot)com> |
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Popcount optimization for the slow-path lookups |
| Date: | 2025-12-05 15:39:51 |
| Message-ID: | aTL8x7hylAkptvBi@nathan |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, Dec 05, 2025 at 03:07:07PM +0200, Andrew Pogrebnoi wrote:
> I want to propose an optimization for pg_popcount32_slow() and
> pg_popcount64_slow() where lookups into pg_number_of_ones[] are made
> branchless. It shows speedup around 58% for uint64 and 35% for uint32 words
> compared to the current, looped version. This is on x86. It is much more
> significant on Arm64 (Apple M1): ~x2.6 for uint64 and ~x2.25 for uint32
> words.
>
> I probably have to guard the lookup in pg_popcount64_slow() with `#if
> SIZEOF_LONG == 8` like it's done for __builtin_popcountl() in the same
> function. What do you think?
>
> For benchmarks, I used functions with the copies of the lookup loop from
> pg_popcount32/64_slow(), measuring them against branchless counterparts.
> Then in C++ I pre generated two static vectors with random uint64's and
> uint32's, each of 5M size and ran lookups using google/benchmark [0]. I've
> run it on M1 and x86 Macs.
I don't think the proposed improvements are relevant for either of the
machines you used for your benchmarks. For x86, we've optimized our
popcount code to use SSE4.2 or AVX-512, and for AArch64, we've optimized it
to use Neon or SVE. And for other systems, we still try to use
__builtin_popcount() and friends in the fallback paths, which IIUC are
available on both gcc and clang (and maybe elsewhere). IMHO we need to run
the benchmarks on a compiler/architecture combination where it would
actually be used in practice.
--
nathan
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Hannu Krosing | 2025-12-05 15:50:24 | Re: making tid and HOTness of UPDATE available to logical decoding plugins |
| Previous Message | Peter Eisentraut | 2025-12-05 15:33:22 | commented out code |