Quick Links

Re: Optimize Arm64 crc32c implementation in Postgresql

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Yuqi Gu <Yuqi(dot)Gu(at)arm(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Optimize Arm64 crc32c implementation in Postgresql
Date:	2018-04-04 09:23:36
Message-ID:	e3a105f2-4fa3-802a-5db3-f0e062f61076@iki.fi
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 03/04/18 19:43, Andres Freund wrote:
> Architecture manual time? They're available freely IIRC and should
> answer this.

Yeah. The best reference I could find was "ARM Cortex-A Series
Programmer’s Guide for ARMv8-A"
(http://infocenter.arm.com/help/topic/com.arm.doc.den0024a/ch08s01.html)
In the "Porting to A64" section, it says:

> Data and code must be aligned to appropriate boundaries. The
> alignment of accesses can affect performance on ARM cores and can
> represent a portability problem when moving code from an earlier
> architecture to ARMv8-A. It is worth being aware of alignment issues
> for performance reasons, or when porting code that makes assumptions
> about pointers or 32-bit and 64-bit integer variables.

I was a bit surprised by the "must be aligned to appropriate boundaries"
statement. Googling around, the strict alignment requirement was removed
in ARMv7, and since then, unaligned access works similarly to Intel. I
think there are some special instructions, like atomic ops, that require
alignment though. Perhaps that's what that sentence refers to.

On 03/04/18 20:47, Tom Lane wrote:
> I'm pretty sure that some ARM platforms emulate unaligned access through
> kernel trap handlers, which would certainly make this a lot slower than
> handling the unaligned bytes manually. Maybe that doesn't apply to any
> ARM CPU that has this instruction ... but as you said, it'd be better
> to consider the presence of the instruction as orthogonal to other
> CPU features.

I did some quick testing, and found that unaligned access is about 2x
slower than aligned. I don't think it's being trapped by the kernel, I
think that would be even slower, but clearly there is an effect there.
So I added code to process the first 1-7 bytes separately, so that the
main loop runs on 8-byte aligned addresses.

Pushed, thanks everyone!

- Heikki

In response to

Re: Optimize Arm64 crc32c implementation in Postgresql at 2018-04-03 17:47:18 from Tom Lane

Responses

Re: Optimize Arm64 crc32c implementation in Postgresql at 2018-04-04 11:13:42 from Thomas Munro

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alexander Korotkov	2018-04-04 09:38:24	Re: [HACKERS] GUC for cleanup indexes threshold.
Previous Message	Konstantin Knizhnik	2018-04-04 08:54:14	Postgres stucks in deadlock detection