RE: CRC32C Parallel Computation Optimization on ARM

From: Xiang Gao <Xiang(dot)Gao(at)arm(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: CRC32C Parallel Computation Optimization on ARM
Date: 2023-10-26 07:28:35
Message-ID: DB9PR08MB6991EA677FD4FC35914D541EF5DDA@DB9PR08MB6991.eurprd08.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 25 Oct, 2023 at 10:43:25 -0500, Nathan Bossart wrote:
>+pg_crc32c
>+pg_comp_crc32c_with_vmull_armv8(pg_crc32c crc, const void *data, size_t len)

>It looks like most of this function is duplicated from
>pg_comp_crc32c_armv8(). I understand that we probably need a separate
>function because of the runtime check, but perhaps we could create a common
>static inline helper function with a branch for when vmull_p64() can be
>used. It's callers would then just provide a boolean to indicate which
>branch to take.

I have modified and remade the patch.

>+# Use ARM VMULL if available and ARM CRC32C intrinsic is avaliable too.
>+if test x"$USE_ARMV8_VMULL" = x"" && (test x"$USE_ARMV8_CRC32C" = x"1" || test x"$USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK" = x"1"); then
>+ if test x"$pgac_armv8_vmull_intrinsics" = x"yes"; then
>+ USE_ARMV8_VMULL=1
>+ fi
>+fi

>Hm. I wonder if we need to switch to a runtime check in some cases. For
>example, what happens if the ARMv8 intrinsics used today are found with the
>default compiler flags, but vmull_p64() is only available if
>-march=armv8-a+crypto is added? It looks like the precedent is to use a
>runtime check if we need extra CFLAGS to produce code that uses the
>intrinsics.

We consider that a runtime check needs to be done in any scenario.
Here we only confirm that the compilation can be successful.
A runtime check will be done when choosing which algorithm.
You can think of us as merging USE_ARMV8_VMULL and USE_ARMV8_VMULL_WITH_RUNTIME_CHECK into USE_ARMV8_VMULL.

>Separately, I wonder if we should just always do runtime checks for the CRC
>stuff whenever we can produce code with the intrinics, regardless of
>whether we need extra CFLAGS. The check doesn't look terribly expensive,
>and it might allow us to simplify the code a bit (especially now that we
>support a few different architectures).

Yes, I think so. USE_ARMV8_CRC32C only means that the compilation is successful,
and it does not guarantee that it can run correctly on the local machine.
Therefore, a runtime check is required during actual operation.
Based on the principle of minimal changes, we plan to fix it in the next patch.
If the community agrees, we will continue to improve it later, such as merging x86 and arm code, etc.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Attachment Content-Type Size
0003-crc32c-parallel-computation-optimization-on-arm.patch application/octet-stream 16.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-10-26 07:31:19 Re: Is this a problem in GenericXLogFinish()?
Previous Message Rushabh Lathia 2023-10-26 07:10:21 Parallel query behaving different with custom GUCs