| From: | Ants Aasma <ants(dot)aasma(at)cybertec(dot)at> |
|---|---|
| To: | John Naylor <johncnaylorls(at)gmail(dot)com> |
| Cc: | Andrew Kim <tenistarkim(at)gmail(dot)com>, Oleg Tselebrovskiy <o(dot)tselebrovskiy(at)postgrespro(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Proposal for enabling auto-vectorization for checksum calculations |
| Date: | 2026-03-30 15:00:59 |
| Message-ID: | CANwKhkMN31RoNab8ovJjZaW=o6CNHCu-rznk85wKO=L5z5-PSA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Mon, 30 Mar 2026 at 15:01, John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
> I don't remember the last time anyone did measurements, so I went
> ahead and did that:
>
> master: 945ms
> 32 AVX2: 335ms
> 64 AVX2: 220ms
I'm guessing this is on a recent Intel. Any extra width is helpful on Intel
as they doubled vpmulld latency from under us after we had settled on this
algorithm. uops.info shows that the most recent Arrow Lake-P cores bring
the latency down to 5. B Intels product lineup is so confusing that it's
hard to tell which products this core ships in. As far as I can tell not in
any Xeons yet. AMD has had 3 cycle vpmulld since Zen 3.
Out of curiosity I tried some approximate numbers on Zen 5 for differing
N_SUMS values. Numbers are ns per iteration for 10M iterations.
GCC 15.2 -O3:
n16 n32 n64 n128 n256
x86-64 620.1 482.4 493.9 543.1 584.0
x86-64-v2 188.6 125.5 121.3 183.9 196.6
x86-64-v3 185.2 101.3 63.2 60.9 101.6
x86-64-v4 182.9 86.0 53.9 35.4 30.5
native 178.2 84.7 54.0 34.5 30.9
clang 20.1 -O3:
n16 n32 n64 n128 n256
x86-64 611.7 264.0 254.7 283.9 304.0
x86-64-v2 603.7 134.0 137.9 236.1 165.8
x86-64-v3 252.1 103.2 61.9 124.0 96.9
x86-64-v4 223.9 102.1 61.4 101.7 68.9
native 203.3 91.0 54.5 35.0 40.4
FWIW I think AVX2 (x86-64-v3) is fine. On AMD the speed is close to core to
fabric bandwidth and Intel has significantly less bandwidth on server chips.
Regards,
Ants Aasma
| Attachment | Content-Type | Size |
|---|---|---|
| bench-checksums.c | text/x-csrc | 1023 bytes |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andrew Dunstan | 2026-03-30 15:06:39 | Re: [PATCH] Add support for INSERT ... SET syntax |
| Previous Message | Tom Lane | 2026-03-30 14:56:12 | Re: [PATCH] Add support for INSERT ... SET syntax |