From: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
---|---|
To: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
Cc: | "Amonson, Paul D" <paul(dot)d(dot)amonson(at)intel(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Popcount optimization using AVX512 |
Date: | 2024-04-01 11:06:12 |
Message-ID: | 202404011106.y4fci35kzdqt@alvherre.pgsql |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2024-Mar-31, Nathan Bossart wrote:
> +uint64
> +pg_popcount_avx512(const char *buf, int bytes)
> +{
> + uint64 popcnt;
> + __m512i accum = _mm512_setzero_si512();
> +
> + for (; bytes >= sizeof(__m512i); bytes -= sizeof(__m512i))
> + {
> + const __m512i val = _mm512_loadu_si512((const __m512i *) buf);
> + const __m512i cnt = _mm512_popcnt_epi64(val);
> +
> + accum = _mm512_add_epi64(accum, cnt);
> + buf += sizeof(__m512i);
> + }
> +
> + popcnt = _mm512_reduce_add_epi64(accum);
> + return popcnt + pg_popcount_fast(buf, bytes);
> +}
Hmm, doesn't this arrangement cause an extra function call to
pg_popcount_fast to be used here? Given the level of micro-optimization
being used by this code, I would have thought that you'd have tried to
avoid that. (At least, maybe avoid the call if bytes is 0, no?)
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"El Maquinismo fue proscrito so pena de cosquilleo hasta la muerte"
(Ijon Tichy en Viajes, Stanislaw Lem)
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Borisov | 2024-04-01 11:13:05 | Re: Fix parameters order for relation_copy_for_cluster |
Previous Message | Nazir Bilal Yavuz | 2024-04-01 10:55:45 | Re: Building with meson on NixOS/nixpkgs |