RE: CRC32C Parallel Computation Optimization on ARM

From: Xiang Gao <Xiang(dot)Gao(at)arm(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: CRC32C Parallel Computation Optimization on ARM
Date: 2023-11-03 10:46:57
Message-ID: DB9PR08MB69911391D134245352F01343F5A5A@DB9PR08MB6991.eurprd08.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Date: Thu, 2 Nov 2023 09:35:50AM -0500, Nathan Bossart wrote:

>On Thu, Nov 02, 2023 at 06:17:20AM +0000, Xiang Gao wrote:
>> After reading the discussion, I understand that in order to avoid performance
>> regression in some instances, we need to try our best to avoid runtime checks.
> >I don't know if I understand it correctly.

>The idea is that we don't want to start forcing runtime checks on builds
>where we aren't already doing runtime checks. IOW if the compiler can use
>the ARMv8 CRC instructions with the default compiler flags, we should only
>use vmull_p64() if it can also be used with the default compiler flags. I
>suspect this limitation sounds worse than it actually is in practice. The
>vast majority of the buildfarm uses runtime checks, and at least some of
>the platforms that don't, such as the Apple M-series machines, seem to
>include +crypto by default.

>Of course, if a compiler picks up +crc but not +crypto in its defaults, we
>could lose the vmull_p64() optimization on that platform. But as noted in
>the other thread, we can revisit if these kinds of hypothetical situations
>become reality.

>> Could you please give me some suggestions about how to refine this patch?

>Of course. I think we'll ultimately want to independently check for the
>availability of the new instruction like we do for the other sets of
>intrinsics:
>
> PGAC_ARMV8_VMULL_INTRINSICS([])
> if test x"$pgac_armv8_vmull_intrinsics" != x"yes"; then
> PGAC_ARMV8_VMULL_INTRINSICS([-march=armv8-a+crypto])
> fi
>
>My current thinking is that we'll want to add USE_ARMV8_VMULL and
>USE_ARMV8_VMULL_WITH_RUNTIME_CHECK and use those to decide exactly what to
>compile. I'll admit I haven't fully thought through every detail yet, but
>I'm cautiously optimistic that we can avoid too much complexity in the
>autoconf/meson scripts.

Thank you so much!
This is the newest patch, I think the code for which crc algorithm to choose is a bit complicated. Maybe we can just use USE_ARMV8_VMULL only, and do runtime checks on the vmull_p64 instruction at all times. This will not affect the existing builds, because this is a new instruction and new logic. In addition, it can also reduce the complexity of the code.
Very much looking forward to receiving your suggestions, thank you!
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Attachment Content-Type Size
0005-crc32c-parallel-computation-optimization-on-arm.patch application/octet-stream 21.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2023-11-03 11:16:05 Re: Popcount optimization using AVX512
Previous Message vignesh C 2023-11-03 10:45:45 Re: pg_upgrade and logical replication