Re: always use runtime checks for CRC-32C instructions

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org, Xiang(dot)Gao(at)arm(dot)com
Subject: Re: always use runtime checks for CRC-32C instructions
Date: 2023-10-31 19:16:16
Message-ID: 2613682.1698779776@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Nathan Bossart <nathandbossart(at)gmail(dot)com> writes:
> On Mon, Oct 30, 2023 at 10:36:01PM -0500, Nathan Bossart wrote:
>> I tested pg_waldump -z with 50M 65-byte records for the following
>> implementations on an ARM system:
>>
>> * slicing-by-8 : ~3.08s
>> * proposed patches applied (runtime check) : ~2.44s
>> * only CRC intrinsics implementation compiled : ~2.42s
>> * forced inlining : ~2.38s
>>
>> Avoiding the runtime check produced a 0.8% improvement, and forced inlining
>> produced another 1.7% improvement. In comparison, even the runtime check
>> implementation produced a 20.8% improvement over the slicing-by-8 one.

I find these numbers fairly concerning. If you can see a
couple-of-percent slowdown on a macroscopic benchmark like pg_waldump,
that implies that the percentage slowdown considering the CRC
operation alone is much worse. So there may be other use-cases where
we would take a bigger relative hit.

> * From my quick scan of a few dozen machines on the buildfarm, it looks
> like the runtime checks are already the norm, so the number of systems
> that would be subject to a regression from v16 to v17 should be pretty
> small, in theory. And this regression seems to be on the order of 1%
> based on the numbers above.

I did a more thorough scrape of the buildfarm results. Of 161 animals
currently reporting configure output on HEAD, we have

2 ARMv8 CRC instructions
36 ARMv8 CRC instructions with runtime check
2 LoongArch CRCC instructions
2 SSE 4.2
52 SSE 4.2 with runtime check
67 slicing-by-8

While that'd seem to support your conclusion, the two using ARM CRC
*without* a runtime check are my Apple M1 Mac animals (sifaka/indri);
and I see the same selection on my laptop. So one platform where
we'd clearly be taking a regression is M-series Macs; that's a pretty
popular platform. The two using SSE without a check are prion and
tayra. I notice those are using gcc 11; so perhaps the default cflags
have changed to include -msse4.2 recently? I couldn't see much other
pattern though. (Scraping results attached in case anybody wants to
look.)

Really this just reinforces my concern that doing a runtime check
all the time is on the wrong side of history. I grant that we've
got to do that for anything where the availability of the instruction
is really in serious question, but I'm not very convinced that that's
a majority situation on popular platforms.

regards, tom lane

Attachment Content-Type Size
results.csv text/plain 13.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-10-31 19:42:33 Re: always use runtime checks for CRC-32C instructions
Previous Message Michael Banck 2023-10-31 19:01:38 Re: [patch] pg_basebackup: mention that spread checkpoints are the default in --help