| From: | "Devulapalli, Raghuveer" <raghuveer(dot)devulapalli(at)intel(dot)com> |
|---|---|
| To: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
| Cc: | John Naylor <johncnaylorls(at)gmail(dot)com>, Andy Fan <zhihuifan1213(at)163(dot)com>, Jesper Pedersen <jesperpedersen(dot)db(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com> |
| Subject: | RE: Improve CRC32C performance on SSE4.2 |
| Date: | 2025-06-16 22:20:59 |
| Message-ID: | PH8PR11MB82866B07AA6758D12F699C00FB70A@PH8PR11MB8286.namprd11.prod.outlook.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Great catch! From the intrinsic manual:
Cast vector of type __m128i to type __m512i; the upper 384 bits of the result are undefined.
Replacing that with _mm512_zextsi128_si512 fixes the problem.
> -----Original Message-----
> From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
> Sent: Monday, June 16, 2025 3:14 PM
> To: Devulapalli, Raghuveer <raghuveer(dot)devulapalli(at)intel(dot)com>
> Cc: John Naylor <johncnaylorls(at)gmail(dot)com>; Andy Fan
> <zhihuifan1213(at)163(dot)com>; Jesper Pedersen <jesperpedersen(dot)db(at)gmail(dot)com>;
> Tomas Vondra <tomas(at)vondra(dot)me>; pgsql-hackers(at)lists(dot)postgresql(dot)org;
> Shankaran, Akash <akash(dot)shankaran(at)intel(dot)com>
> Subject: Re: Improve CRC32C performance on SSE4.2
>
> On Mon, Jun 16, 2025 at 06:31:11PM +0000, Devulapalli, Raghuveer wrote:
> > Attached is a simple reproducer. It passes with clang v16 -O0, but
> > fails with 17 and 18 only when built with -O0..
>
> I've just started looking into this, but the difference in code generated for
> _mm512_castsi128_si512() between gcc, clang 16, and clang 17 looks interesting.
>
> --
> nathan
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Corey Huinker | 2025-06-16 22:44:49 | Re: Allow pg_dump --statistics-only to dump foreign table statistics? |
| Previous Message | Nathan Bossart | 2025-06-16 22:14:19 | Re: Improve CRC32C performance on SSE4.2 |