Re: What exactly is our CRC algorithm?

From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Subject: Re: What exactly is our CRC algorithm?
Date: 2014-11-19 15:58:11
Message-ID: 20141119155811.GA32492@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At 2014-11-11 16:56:00 +0530, ams(at)2ndQuadrant(dot)com wrote:
>
> I'm working on this (first speeding up the default calculation using
> slice-by-N, then adding support for the SSE4.2 CRC instruction on
> top).

I've done the first part in the attached patch, and I'm working on the
second (especially the bits to issue CPUID at startup and decide which
implementation to use).

As a benchmark, I ran pg_xlogdump --stats against 11GB of WAL data (674
segments) generated by running a total of 2M pgbench transactions on a
db initialised with scale factor 25. The tests were run on my i5-3230
CPU, and the code in each case was compiled with "-O3 -msse4.2" (and
without --enable-debug). The profile was dominated by the CRC
calculation in ValidXLogRecord.

With HEAD's CRC code:

bin/pg_xlogdump --stats wal/000000010000000000000001 29.81s user 3.56s system 77% cpu 43.274 total
bin/pg_xlogdump --stats wal/000000010000000000000001 29.59s user 3.85s system 75% cpu 44.227 total

With slice-by-4 (a minor variant of the attached patch; the results are
included only for curiosity's sake, but I can post the code if needed):

bin/pg_xlogdump --stats wal/000000010000000000000001 13.52s user 3.82s system 48% cpu 35.808 total
bin/pg_xlogdump --stats wal/000000010000000000000001 13.34s user 3.96s system 48% cpu 35.834 total

With slice-by-8 (i.e. the attached patch):

bin/pg_xlogdump --stats wal/000000010000000000000001 7.88s user 3.96s system 34% cpu 34.414 total
bin/pg_xlogdump --stats wal/000000010000000000000001 7.85s user 4.10s system 34% cpu 35.001 total

(Note the progressive reduction in user time from ~29s to ~8s.)

Finally, just for comparison, here's what happens when we use the
hardware instruction via gcc's __builtin_ia32_crc32xx intrinsics
(i.e. the additional patch I'm working on):

bin/pg_xlogdump --stats wal/000000010000000000000001 3.33s user 4.79s system 23% cpu 34.832 total

There are a number of potential micro-optimisations, I just wanted to
submit the obvious thing first and explore more possibilities later.

-- Abhijit

Attachment Content-Type Size
slice8.diff text/x-diff 32.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-11-19 16:03:12 Re: group locking: incomplete patch, just for discussion
Previous Message Andres Freund 2014-11-19 15:57:27 Re: Add shutdown_at_recovery_target option to recovery.conf