Re: Optimize Arm64 crc32c implementation in Postgresql

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Yuqi Gu <Yuqi(dot)Gu(at)arm(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimize Arm64 crc32c implementation in Postgresql
Date: 2018-04-03 16:38:42
Message-ID: 37204430-76fb-0eaa-06d9-dbf4f6473c99@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/04/18 19:09, Andres Freund wrote:
> Hi,
>
> On 2018-04-03 19:05:19 +0300, Heikki Linnakangas wrote:
>> On 01/04/18 20:32, Andres Freund wrote:
>>> On 2018-03-06 02:44:35 +0800, Heikki Linnakangas wrote:
>>>> * I tested this on Linux, with gcc and clang, on an ARM64 virtual machine
>>>> that I had available (not an emulator, but a VM on a shared ARM64 server).
>>>
>>> Have you seen actual postgres performance benefits with the patch?
>>
>> I just ran a small test with pg_waldump, similar to what Abhijit Menon-Sen
>> ran with the Slicing-by-8 and Intel SSE patches, when we added those
>> (https://www.postgresql.org/message-id/20141119155811.GA32492%40toroid.org).
>> I ran pgbench, with scale factor 5, until it had generated about 1 GB of
>> WAL, and then I ran pg_waldump -z on that WAL. With slicing-by-8, it took
>> about 7 s, and with the special CPU instructions, about 5 s. 'perf' showed
>> that the CRC computation took about 30% of the CPU time before, and about
>> 12% after, which sounds about right. That's not as big a speedup as we saw
>> with the corresponding Intel SSE instructions back in 2014, but still quite
>> worthwhile.
>
> Cool. Based on a skim the patch looks reasonable.

Thanks.

I bikeshedded with myself on the naming of things, and decided to use
"ARMv8" in the variable and file names, instead of ARM64 or ARMCE or
ARM64CE. The CRC instructions were introduced in ARM v8 (as an optional
feature), it's not really related to the 64-bitness, even though the
64-bit instruction set was also introduced in ARM v8. Other than that,
and some comment fixes, this is the same as the previous patch version.

I was just about to commit this, when I started to wonder: Do we need to
worry about alignment? As the patch stands, it will merrily do unaligned
8-byte loads. Is that OK on ARM? It seems to work on the system I've
been testing on, but I don't know. And even if it's OK, would it perform
better if we did 1-byte loads in the beginning, until we reach the
8-byte boundary?

- Heikki

Attachment Content-Type Size
armv8-crc32c-2.patch text/x-patch 23.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-04-03 16:43:56 Re: Optimize Arm64 crc32c implementation in Postgresql
Previous Message Andres Freund 2018-04-03 16:20:05 Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()