| From: | John Naylor <johncnaylorls(at)gmail(dot)com> |
|---|---|
| To: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
| Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: refactor architecture-specific popcount code |
| Date: | 2026-02-02 14:16:42 |
| Message-ID: | CANWCAZbWLX=EDd1Bq-8oGK2ZLVNR4m4BkGe=288t2V5oLcqeZA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Sat, Jan 31, 2026 at 4:33 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> On Fri, Jan 30, 2026 at 03:22:45PM +0700, John Naylor wrote:
> > 0001 - I'm pretty sure this is comparable to HEAD if the optimized
> > function is pg_popcount_sse42(). Has the AVX512 version been tested
> > with 8-byte inputs? That seems to have a lot of pre- and
> > post-processing involved. The inline wrapper only bypasses for 7 or
> > less bytes.
>
> Here [0] is the latest perf data I see for the AVX-512 popcount patch,
> although that's comparing to v16, which IIRC lacks a few other inlining
> tricks. There's a chance the SSE4.2 version is faster at that particular
> length. I'm not sure we need to worry about that, but I can do a bit of
> testing if you'd like.
It might be a good idea to do a little new testing, and I see a use
for a special 8-byte path independent of AVX512: v6 seems to regress a
little for single-words. But, it turns out that when gcc turns
__builtin_popcountl into a single instruction, it's inline, but if it
emits portable bitwise ops, it does so in a function called
__popcountdi2(). That can be avoided by hand-coding in C for normal
builds (and for 32-bit looks cleaner anyway), as in the attached 0005.
My laptop here is really too old to make decisions that are
micro-architecture dependent, but with that caveat, I dusted off the
popcount benchmark and added a test for counting bitmapsets (v7-0004,
applies on top of v6):
select drive_bms_num_members(10000000, 1);
master: 13.2 ticks per call
v6: 15.3
v6+v7-0005 10.8
Again, take this with a grain of salt, but 0005 seems worth looking at.
--
John Naylor
Amazon Web Services
| Attachment | Content-Type | Size |
|---|---|---|
| v7-0005-Bypass-function-call-on-x86.patch.nocfbot | application/octet-stream | 2.9 KB |
| v7-0004-Test-module-for-popcount-plus-bitmapset-RDTSC.patch.nocfbot | application/octet-stream | 5.1 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Manni Wood | 2026-02-02 14:17:55 | Re: Speed up COPY FROM text/CSV parsing using SIMD |
| Previous Message | Tom Lane | 2026-02-02 14:14:28 | Re: getting "shell command argument contains a newline or carriage return:" error with pg_dumpall when db name have new line in double quote |