Re: [REVIEW] Re: Compression of full-page-writes

From: Arthur Silva <arthurprs(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Rahila Syed <rahilasyed(dot)90(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Subject: Re: [REVIEW] Re: Compression of full-page-writes
Date: 2014-09-14 00:50:55
Message-ID: CAO_YK0W5Jp2m2jhn20oF80nvxHWfAnhdQ-1ZABrrG4MKZuAiQg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Sep 13, 2014 at 1:55 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > On 2014-09-13 08:52:33 +0300, Ants Aasma wrote:
> >> On Sat, Sep 13, 2014 at 6:59 AM, Arthur Silva <arthurprs(at)gmail(dot)com>
> wrote:
> >>> That's not entirely true. CRC-32C beats pretty much everything with
> the same
> >>> length quality-wise and has both hardware implementations and highly
> >>> optimized software versions.
>
> >> For better or for worse CRC is biased by detecting all single bit
> >> errors, the detection capability of larger errors is slightly
> >> diminished. The quality of the other algorithms I mentioned is also
> >> very good, while producing uniformly varying output.
>
> > There's also much more literature about the various CRCs in comparison
> > to some of these hash allgorithms.
>
> Indeed. CRCs have well-understood properties for error detection.
> Have any of these new algorithms been analyzed even a hundredth as
> thoroughly? No. I'm unimpressed by evidence-free claims that
> something else is "also very good".
>
> Now, CRCs are designed for detecting the sorts of short burst errors
> that are (or were, back in the day) common on phone lines. You could
> certainly make an argument that that's not the type of threat we face
> for PG data. However, I've not seen anyone actually make such an
> argument, let alone demonstrate that some other algorithm would be better.
> To start with, you'd need to explain precisely what other error pattern
> is more important to defend against, and why.
>
> regards, tom lane
>

Mysql went this way as well, changing the CRC polynomial in 5.6.

What we are looking for here is uniqueness thus better error detection. Not
avalanche effect, nor cryptographically secure, nor bit distribution.
As far as I'm aware CRC32C is unbeaten collision wise and time proven.

I couldn't find tests with xxhash and crc32 on the same hardware so I spent
some time putting together a benchmark (see attachment, to run it just
start run.sh)

I included a crc32 implementation using ssr4.2 instructions (which works on
pretty much any Intel processor built after 2008 and AMD built after 2012),
a portable Slice-By-8 software implementation and xxhash since it's the
fastest software 32bit hash I know of.

Here're the results running the test program on my i5-4200M

crc sb8: 90444623
elapsed: 0.513688s
speed: 1.485220 GB/s

crc hw: 90444623
elapsed: 0.048327s
speed: 15.786877 GB/s

xxhash: 7f4a8d5
elapsed: 0.182100s
speed: 4.189663 GB/s

The hardware version is insanely and works on the majority of Postgres
setups and the fallback software implementations is 2.8x slower than the
fastest 32bit hash around.

Hopefully it'll be useful in the discussion.

Attachment Content-Type Size
bench.zip application/zip 22.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-09-14 01:06:26 Re: Audit of logout
Previous Message Peter Eisentraut 2014-09-14 00:46:35 Re: run xmllint during build (was Re: need xmllint on borka)