Re: Optimize Arm64 crc32c implementation in Postgresql

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Yuqi Gu <Yuqi(dot)Gu(at)arm(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimize Arm64 crc32c implementation in Postgresql
Date: 2018-04-03 18:56:12
Message-ID: 0E10D81B-4653-429D-AB17-665E0474A00B@yesql.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 03 Apr 2018, at 18:38, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> On 03/04/18 19:09, Andres Freund wrote:
>> Hi,
>> On 2018-04-03 19:05:19 +0300, Heikki Linnakangas wrote:
>>> On 01/04/18 20:32, Andres Freund wrote:
>>>> On 2018-03-06 02:44:35 +0800, Heikki Linnakangas wrote:
>>>>> * I tested this on Linux, with gcc and clang, on an ARM64 virtual machine
>>>>> that I had available (not an emulator, but a VM on a shared ARM64 server).
>>>>
>>>> Have you seen actual postgres performance benefits with the patch?
>>>
>>> I just ran a small test with pg_waldump, similar to what Abhijit Menon-Sen
>>> ran with the Slicing-by-8 and Intel SSE patches, when we added those
>>> (https://www.postgresql.org/message-id/20141119155811.GA32492%40toroid.org).
>>> I ran pgbench, with scale factor 5, until it had generated about 1 GB of
>>> WAL, and then I ran pg_waldump -z on that WAL. With slicing-by-8, it took
>>> about 7 s, and with the special CPU instructions, about 5 s. 'perf' showed
>>> that the CRC computation took about 30% of the CPU time before, and about
>>> 12% after, which sounds about right. That's not as big a speedup as we saw
>>> with the corresponding Intel SSE instructions back in 2014, but still quite
>>> worthwhile.
>> Cool. Based on a skim the patch looks reasonable.
>
> Thanks.
>
> I bikeshedded with myself on the naming of things, and decided to use "ARMv8" in the variable and file names, instead of ARM64 or ARMCE or ARM64CE. The CRC instructions were introduced in ARM v8 (as an optional feature), it's not really related to the 64-bitness, even though the 64-bit instruction set was also introduced in ARM v8. Other than that, and some comment fixes, this is the same as the previous patch version.

Noticed two minor documentation issues in a quick skim of the attached patch:

The following line should say SSE and not SSSE (and the same typo is in
src/include/pg_config.h.in and src/include/pg_config.h.win32). While not
introduced in this patch, it’s adjacent to the patched codepath and on topic so
may well be fixed while in there.

AC_DEFINE(USE_SSE42_CRC32C_WITH_RUNTIME_CHECK, 1, [Define to 1 to use Intel SSSE 4.2 CRC instructions with a runtime check.])

The documentation in configure.in use “runtime” rather than "run time” in all
lines except this one:

+# uses the CRC instructions, compile both, and select at run time.

cheers ./daniel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2018-04-03 19:11:35 Re: Foreign keys and partitioned tables
Previous Message Tom Lane 2018-04-03 18:55:07 Re: pgsql: Validate page level checksums in base backups