Re: Popcount optimization using AVX512

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Ants Aasma <ants(dot)aasma(at)cybertec(dot)at>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Amonson, Paul D" <paul(dot)d(dot)amonson(at)intel(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Popcount optimization using AVX512
Date: 2024-04-05 12:58:44
Message-ID: 20240405125844.GB4102502@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 05, 2024 at 10:33:27AM +0300, Ants Aasma wrote:
> The main issue I saw was that clang was able to peel off the first
> iteration of the loop and then eliminate the mask assignment and
> replace masked load with a memory operand for vpopcnt. I was not able
> to convince gcc to do that regardless of optimization options.
> Generated code for the inner loop:
>
> clang:
> <L2>:
> 50: add rdx, 64
> 54: cmp rdx, rdi
> 57: jae <L1>
> 59: vpopcntq zmm1, zmmword ptr [rdx]
> 5f: vpaddq zmm0, zmm1, zmm0
> 65: jmp <L2>
>
> gcc:
> <L1>:
> 38: kmovq k1, rdx
> 3d: vmovdqu8 zmm0 {k1} {z}, zmmword ptr [rax]
> 43: add rax, 64
> 47: mov rdx, -1
> 4e: vpopcntq zmm0, zmm0
> 54: vpaddq zmm0, zmm0, zmm1
> 5a: vmovdqa64 zmm1, zmm0
> 60: cmp rax, rsi
> 63: jb <L1>
>
> I'm not sure how much that matters in practice. Attached is a patch to
> do this manually giving essentially the same result in gcc. As most
> distro packages are built using gcc I think it would make sense to
> have the extra code if it gives a noticeable benefit for large cases.

Yeah, I did see this, but I also wasn't sure if it was worth further
complicating the code. I can test with and without your fix and see if it
makes any difference in the benchmarks.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2024-04-05 13:00:38 Re: broken JIT support on Fedora 40
Previous Message Daniel Verite 2024-04-05 12:58:39 Re: Fixing backslash dot for COPY FROM...CSV