Re: refactor architecture-specific popcount code

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Greg Burd <greg(at)burd(dot)me>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: refactor architecture-specific popcount code
Date: 2026-02-20 08:21:05
Message-ID: CANWCAZaCaWR0SKkrBcm9WWRHMRs-Mbj8Wr+on2m8TPk0BiMt4A@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 5, 2026 at 4:43 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
> Sure. I'm tempted to suggest that we only use the plain C version here,
> too. The SSE4.2 bms_num_members() test I did yesterday used it and showed
> improvement at one word. If we do that, we can rip out even more code
> since we no longer need the popcount built-ins.
>
> * tests plain C version on an Apple M3 *
>
> Yeah, the plain C version might be marginally slower than the built-in
> version for that test, but it still seems quite a bit faster than HEAD.
>
> HEAD v8 v10
> 40 25 29

(for the following, numbers are nanoseconds per call from
drive_bms_num_members())

Seems similar on S390X / gcc 13.3 (last week I only tested a single
bitmapword and feel don't like repeating):

master (older): 4.1083 (call builtin)
v8: 2.8889 (inline builtin)
v10: 2.7961 (inline pure C)

On ppc64le / gcc 8.5, without native popcount it suffers:

words master v14
1 4.5 6.5
2 5.8 9.7
64 67.9 101
128 143 190

So one up, one down among obscure platforms. There seems to be a
fairly thin case for the builtin anymore, although it's not zero.

--
John Naylor
Amazon Web Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zsolt Parragi 2026-02-20 08:26:14 Re: centralize CPU feature detection
Previous Message Amit Kapila 2026-02-20 07:58:39 Re: Patch for migration of the pg_commit_ts directory