| From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
|---|---|
| To: | John Naylor <johncnaylorls(at)gmail(dot)com> |
| Cc: | Greg Burd <greg(at)burd(dot)me>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: refactor architecture-specific popcount code |
| Date: | 2026-02-20 15:39:38 |
| Message-ID: | aZiAOo25VBa6PoQi@nathan |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, Feb 20, 2026 at 03:21:05PM +0700, John Naylor wrote:
> On Thu, Feb 5, 2026 at 4:43 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>> Yeah, the plain C version might be marginally slower than the built-in
>> version for that test, but it still seems quite a bit faster than HEAD.
>>
>> HEAD v8 v10
>> 40 25 29
>
> (for the following, numbers are nanoseconds per call from
> drive_bms_num_members())
>
> Seems similar on S390X / gcc 13.3 (last week I only tested a single
> bitmapword and feel don't like repeating):
>
> master (older): 4.1083 (call builtin)
> v8: 2.8889 (inline builtin)
> v10: 2.7961 (inline pure C)
Thanks for testing it.
> On ppc64le / gcc 8.5, without native popcount it suffers:
>
> words master v14
> 1 4.5 6.5
> 2 5.8 9.7
> 64 67.9 101
> 128 143 190
>
> So one up, one down among obscure platforms. There seems to be a
> fairly thin case for the builtin anymore, although it's not zero.
I spent some time looking at how clang/gcc compiled the plain-C version on
various architectures [0], and I was pleasantly surprised to discover that
at some point in recent history they started automatically converting it to
special popcount instructions. I suspect that you'd see better results on
ppc64le if you upgraded the compiler...
[0] https://godbolt.org/z/v9vvx7E89
--
nathan
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Álvaro Herrera | 2026-02-20 15:48:40 | Re: Show comments in \dRp+, \dRs+, and \dX+ psql meta-commands |
| Previous Message | Vitaly Davydov | 2026-02-20 15:07:07 | Re: Support logical replication of DDLs |