| From: | John Naylor <johncnaylorls(at)gmail(dot)com> |
|---|---|
| To: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
| Cc: | Andrew Pogrebnoi <andrew(dot)pogrebnoi(at)percona(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Popcount optimization for the slow-path lookups |
| Date: | 2025-12-08 05:49:59 |
| Message-ID: | CANWCAZYR95M_XZrR0ruNtNTBNHX68hNg=G4D3V5yNJQejgNoQg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, Dec 5, 2025 at 10:40 PM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
> I don't think the proposed improvements are relevant for either of the
> machines you used for your benchmarks. For x86, we've optimized our
> popcount code to use SSE4.2 or AVX-512, and for AArch64, we've optimized it
> to use Neon or SVE. And for other systems, we still try to use
> __builtin_popcount() and friends in the fallback paths, which IIUC are
> available on both gcc and clang (and maybe elsewhere). IMHO we need to run
> the benchmarks on a compiler/architecture combination where it would
> actually be used in practice.
Yeah, if we did anything here, I'd rather arrange so that
architectures that have unconditional hardware support can inline it
at compile time. I believe ppc64le and aarch64 can do that
unconditionally. For x86 we might be able to detect some symbol
defined by the compiler, to do the same thing for OS's that require
such support.
--
John Naylor
Amazon Web Services
| From | Date | Subject | |
|---|---|---|---|
| Next Message | John Naylor | 2025-12-08 06:26:38 | Re: tuple radix sort |
| Previous Message | Peter Smith | 2025-12-08 05:46:56 | Re: Skipping schema changes in publication |