| From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
|---|---|
| To: | John Naylor <johncnaylorls(at)gmail(dot)com> |
| Cc: | Greg Burd <greg(at)burd(dot)me>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: refactor architecture-specific popcount code |
| Date: | 2026-02-04 21:43:50 |
| Message-ID: | aYO9llwttJEtl7er@nathan |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Feb 04, 2026 at 12:13:35PM +0700, John Naylor wrote:
> - /*
> - * We set the threshold to the point at which we'll first use special
> - * instructions in the optimized version.
> - */
> [...]
>
> It seems like we should still have some kind of comment here. Even
> just to say the 8 value was found through testing. (It was some time
> ago, right?)
Yeah, it needs a comment. I recall doing a lot of testing for small
inputs since those are what tended to regress.
> It was intentional that my PoC pg_popcount32 was simple, pure C and
> didn't have #ifdefs anymore, and I'd prefer to go back to that. This
> function is now only used for single-word bitmapsets on 32-bit
> platforms and in an assert, and even if that changes we've shown that
> inlining bitwise ops for a single word is already pretty good for
> performance. Plus, doesn't this cause gcc generate a function call on
> 32-bit x86 because "!defined(__x86_64__)" is true? That defeats the
> whole purpose of inlining in the first place. Simple is good here.
Agreed and done.
> +#elif defined(_MSC_VER)
> + return __popcnt64(word);
>
> The commit message says "converts the portable ones to inlined
> functions", but this was copied from a architecture specific file with
> a runtime check. I've seen an assertion in this thread that the
> hardware instruction is required for some Windows version, but it
> would be nice to have a link to documentation for the archives. More
> worryingly, this is almost certainly broken on 32-bit, and the
> buildfarm won't tell us -- please see commit
> 53ea2b7ad050ce4ad95c89bb55197209b65886a1 and bug report that led to
> it. Seems like material for a separate commit.
Sure. I'm tempted to suggest that we only use the plain C version here,
too. The SSE4.2 bms_num_members() test I did yesterday used it and showed
improvement at one word. If we do that, we can rip out even more code
since we no longer need the popcount built-ins.
* tests plain C version on an Apple M3 *
Yeah, the plain C version might be marginally slower than the built-in
version for that test, but it still seems quite a bit faster than HEAD.
HEAD v8 v10
40 25 29
We probably want to re-add the Neon version of pg_popcount64() so that
pg_popcount_neon() and pg_popcount_masked_neon() can use it, but that's
easy enough.
> - for (int i = 0; i < RT_BM_IDX(RT_NODE_MAX_SLOTS); i++)
> - cnt += bmw_popcount(n256->isset[i]);
> + cnt += pg_popcount((const char *) n256->isset,
> + RT_NODE_MAX_SLOTS / BITS_PER_BYTE);
>
> This can now be "cnt =". The initialization to zero is now
> unnecessary, but it's also harmless.
Fixed. The only reason I didn't make that change earlier was to keep the
patch tidy.
I haven't updated the commit messages yet. Once the code is ready to go,
I'll give those another try.
[0] https://postgr.es/m/aYJeGs-xz3NEEdSe%40nathan
--
nathan
| Attachment | Content-Type | Size |
|---|---|---|
| v10-0001-Remove-some-unnecessary-optimizations-in-popcoun.patch | text/plain | 6.9 KB |
| v10-0002-Remove-specialized-word-length-popcount-implemen.patch | text/plain | 13.8 KB |
| v10-0003-Make-use-of-pg_popcount-in-more-places.patch | text/plain | 3.0 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Nathan Bossart | 2026-02-04 22:15:58 | Re: Pasword expiration warning |
| Previous Message | Euler Taveira | 2026-02-04 21:12:07 | Re: Pasword expiration warning |