Re: backup manifests

From: Tels <nospam-pg-abuse(at)bloodgate(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: backup manifests
Date: 2019-11-22 22:15:29
Message-ID: c677dec6e478b2b3459f518354cc4fcf@bloodgate.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Moin Robert,

On 2019-11-22 20:01, Robert Haas wrote:
> On Fri, Nov 22, 2019 at 1:10 PM David Steele <david(at)pgmasters(dot)net>
> wrote:
>> Well, the maximum amount of data that can be protected with a 32-bit
>> CRC
>> is 512MB according to all the sources I found (NIST, Wikipedia, etc).
>> I
>> presume that's what we are talking about since I can't find any 64-bit
>> CRC code in core or this patch.
>
> Could you give a more precise citation for this? I can't find a
> reference to that in the Wikipedia article off-hand and I don't know
> where to look in NIST. I apologize if I'm being dense here, but I
> don't see why there should be any limit on the amount of data that can
> be protected. The important thing is that if the original file F is
> altered to F', we hope that CHECKSUM(F) != CHECKSUM(F'). The
> probability of that, assuming that the alteration is random rather
> than malicious and that the checksum function is equally likely to
> produce every possible output, is just 1-2^-${CHECKSUM_BITS},
> regardless of the length of the message (except that there might be
> some special cases for very short messages, which don't matter here).
>
> This analysis by me seems to match
> https://en.wikipedia.org/wiki/Cyclic_redundancy_check, which says:
>
> "Typically an n-bit CRC applied to a data block of arbitrary length
> will detect any single error burst not longer than n bits, and the
> fraction of all longer error bursts that it will detect is (1 −
> 2^−n)."
>
> Notice the phrase "a data block of arbitrary length" and the formula "1
> - 2^-n".

It is related to the number of states, and the birthday problem factors
in it, too:

https://en.wikipedia.org/wiki/Birthday_problem

If you have a 32 bit checksum or hash, it can represent only 2**32-1
states at most (or less, if the
algorithmn isn't really good).

Each byte is 8 bit, so 2 ** 32 / 8 is 512 Mbyte. If you process your
data bit by bit, each
new bit would add a new state (consider: missing bit == 0, added bit ==
1). If each new state
is repesented by a different checksum, all possible 2 ** 32 values are
exhausted after
processing 512 Mbyte, after that you get one of the former states again
- aka a collision.

There is no way around it with so little bits, no matter what algorithmn
you choose.

>> > Phrased more positively, if you want a cryptographic hash
>> > at all, you should probably use one that isn't widely viewed as too
>> > weak.
>>
>> Sure. There's another advantage to picking an algorithm with lower
>> collision rates, though.
>>
>> CRCs are fine for catching transmission errors (as caveated above) but
>> not as great for comparing two files for equality. With strong hashes
>> you can confidently compare local files against the path, size, and
>> hash
>> stored in the manifest and save yourself a round-trip to the remote
>> storage to grab the file if it has not changed locally.
>
> I agree in part. I think there are two reasons why a cryptographically
> strong hash is desirable for delta restore. First, since the checksums
> are longer, the probability of a false match happening randomly is
> lower, which is important. Even if the above analysis is correct and
> the chance of a false match is just 2^-32 with a 32-bit CRC, if you
> back up ten million files every day, you'll likely get a false match
> within a few years or less, and once is too often. Second, unlike what
> I supposed above, the contents of a PostgreSQL data file are not
> chosen at random, unlike transmission errors, which probably are more
> or less random. It seems somewhat possible that there is an adversary
> who is trying to choose the data that gets stored in some particular
> record so as to create a false checksum match. A CRC is a lot easier
> to fool than a crytographic hash, so I think that using a CRC of *any*
> length for this kind of use case would be extremely dangerous no
> matter the probability of an accidental match.

Agreed. See above.

However, if you choose a hash, please do not go below SHA-256. Both MD5
and SHA-1 already had collision attacks, and these only got to be bound
to be worse.

https://www.mscs.dal.ca/~selinger/md5collision/
https://shattered.io/

It might even be a wise idea to encode the used Hash-Algorithm into the
manifest file, so it can be changed later. The hash length might be not
enough to decide which algorithm is the one used.

>> This is the basic premise of what we call delta restore which can
>> speed
>> up restores by orders of magnitude.
>>
>> Delta restore is the main advantage that made us decide to require
>> SHA1
>> checksums. In most cases, restore speed is more important than backup
>> speed.
>
> I see your point, but it's not the whole story. We've encountered a
> bunch of cases where the time it took to complete a backup exceeded
> the user's desired backup interval, which is obviously very bad, or
> even more commonly where it exceeded the length of the user's
> "low-usage" period when they could tolerate the extra overhead imposed
> by the backup. A few percentage points is probably not a big deal, but
> a user who has an 8-hour window to get the backup done overnight will
> not be happy if it's taking 6 hours now and we tack 40%-50% on to
> that. So I think that we either have to disable backup checksums by
> default, or figure out a way to get the overhead down to something a
> lot smaller than what current tests are showing -- which we could
> possibly do without changing the algorithm if we can somehow make it a
> lot cheaper, but otherwise I think the choice is between disabling the
> functionality altogether by default and adopting a less-expensive
> algorithm. Maybe someday when delta restore is in core and widely used
> and CPUs are faster, it'll make sense to revise the default, and
> that's cool, but I can't see imposing a big overhead by default to
> enable a feature core doesn't have yet...

Modern algorithms are amazingly fast on modern hardware, some even
are implemented in hardware nowadays:

https://software.intel.com/en-us/articles/intel-sha-extensions

Quote from:


https://neosmart.net/blog/2017/will-amds-ryzen-finally-bring-sha-extensions-to-intels-cpus/

"Despite the extremely limited availability of SHA extension support
in modern desktop and mobile processors, crypto libraries have already
upstreamed support to great effect. Botan’s SHA extension patches show
a
significant 3x to 5x performance boost when taking advantage of the
hardware
extensions, and the Linux kernel itself shipped with hardware SHA
support
with version 4.4, bringing a very respectable 3.6x performance upgrade
over
the already hardware-assisted SSE3-enabled code."

If you need to load the data from disk and shove it over a network, the
hashing will certainly be very little overhead, it might even be
completely
invisible, since it can run in paralell to all the other things. Sure,
there
is the thing called zero-copy-networking, but if you have to compress
the
data bevore sending it to the network, you have to put it through the
CPU,
anyway. And if you have more than one core, the second one can to the
hashing it paralell to the first one doing the compression.

To get a feeling one can use:

openssl speed md5 sha1 sha256 sha512

On my really-not-fast desktop CPU (i5-4690T CPU @ 2.50GHz) it says:

The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192
bytes 16384 bytes
md5 122638.55k 277023.96k 487725.57k 630806.19k
683892.74k 688553.98k
sha1 127226.45k 313891.52k 632510.55k 865753.43k
960995.33k 977215.19k
sha256 77611.02k 173368.15k 325460.99k 412633.43k
447022.92k 448020.48k
sha512 51164.77k 205189.87k 361345.79k 543883.26k
638372.52k 645933.74k

Or in other words, it can hash nearly 931 MByte /s with SHA-1 and about
427 MByte / s with SHA-256 (if I haven't miscalculated something). You'd
need a
pretty fast disk (aka M.2 SSD) and network (aka > 1 Gbit) to top these
speeds
and then you'd use a real CPU for your server, not some poor Intel
powersaving
surfing thingy-majingy :)

Best regards,

Tels

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-11-22 22:17:33 Re: [PATCH] Tiny optmization.
Previous Message Tomas Vondra 2019-11-22 22:14:52 Re: [PATCH] Tiny optmization or is a bug?