| From: | John Naylor <johncnaylorls(at)gmail(dot)com> |
|---|---|
| To: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
| Cc: | Greg Burd <greg(at)burd(dot)me>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: refactor architecture-specific popcount code |
| Date: | 2026-02-20 08:21:05 |
| Message-ID: | CANWCAZaCaWR0SKkrBcm9WWRHMRs-Mbj8Wr+on2m8TPk0BiMt4A@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Thu, Feb 5, 2026 at 4:43 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
> Sure. I'm tempted to suggest that we only use the plain C version here,
> too. The SSE4.2 bms_num_members() test I did yesterday used it and showed
> improvement at one word. If we do that, we can rip out even more code
> since we no longer need the popcount built-ins.
>
> * tests plain C version on an Apple M3 *
>
> Yeah, the plain C version might be marginally slower than the built-in
> version for that test, but it still seems quite a bit faster than HEAD.
>
> HEAD v8 v10
> 40 25 29
(for the following, numbers are nanoseconds per call from
drive_bms_num_members())
Seems similar on S390X / gcc 13.3 (last week I only tested a single
bitmapword and feel don't like repeating):
master (older): 4.1083 (call builtin)
v8: 2.8889 (inline builtin)
v10: 2.7961 (inline pure C)
On ppc64le / gcc 8.5, without native popcount it suffers:
words master v14
1 4.5 6.5
2 5.8 9.7
64 67.9 101
128 143 190
So one up, one down among obscure platforms. There seems to be a
fairly thin case for the builtin anymore, although it's not zero.
--
John Naylor
Amazon Web Services
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Zsolt Parragi | 2026-02-20 08:26:14 | Re: centralize CPU feature detection |
| Previous Message | Amit Kapila | 2026-02-20 07:58:39 | Re: Patch for migration of the pg_commit_ts directory |