From: | Soumyadeep Chakraborty <soumyadeep2007(at)gmail(dot)com> |
---|---|
To: | John Naylor <johncnaylorls(at)gmail(dot)com> |
Cc: | Nathan Bossart <nathandbossart(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andy Fan <zhihuifan1213(at)163(dot)com>, "Devulapalli, Raghuveer" <raghuveer(dot)devulapalli(at)intel(dot)com>, Jesper Pedersen <jesperpedersen(dot)db(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com> |
Subject: | Re: Improve CRC32C performance on SSE4.2 |
Date: | 2025-07-13 19:28:11 |
Message-ID: | CAE-ML+-X8mnx-AsD-9QtB7rkWvCmcb4+VJWOrg0KPu5K2mucSA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jun 17, 2025 at 1:55 AM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
I took the minimal repro from [1] and took a look at the code generated
between clang 17 -O0 [2] and clang 17 -O3 [3]. I saw that -O3 (and
actually -O1 and -O2) generated the following code for:
castval = _mm512_castsi128_si512(_mm_cvtsi32_si128(crc0));
x0 = _mm512_xor_si512(castval, x0);
vinserti128 ymm0, ymm0, xmmword ptr [rip + .LCPI1_0], 0
vpxorq zmm0, zmm0, zmmword ptr [rdi]
Reading vpxorq's pseudocode [4], it seems that it zeroes out the leading
bits:
DEST[MAXVL-1:VL] := 0
Same thing for clang 17 -O0, if we are using _mm512_zextsi128_si512
instead [5] - vpxor and vbroadcast128 are used which seem to also
zero out leading bits.
So, -O1..-O3 were indeed emitting instructions that zero-extend and, thus
avoiding the undefined behavior.
[1]
https://www.postgresql.org/message-id/PH8PR11MB8286A89AF2B104044187E54DFB70A%40PH8PR11MB8286.namprd11.prod.outlook.com
[2] https://godbolt.org/z/ahx9PePYr
[3] https://godbolt.org/z/W4WPzjnbb
[4] https://www.felixcloutier.com/x86/pxor#vpxorq--evex-encoded-versions-
[5] https://godbolt.org/z/46brvrnnv
Regards,
Deep (VMware)
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2025-07-13 19:43:08 | Re: ABI Compliance Checker GSoC Project |
Previous Message | Melanie Plageman | 2025-07-13 19:15:22 | Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) |