Re: backup manifests

From: David Steele <david(at)pgmasters(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Tels <nospam-pg-abuse(at)bloodgate(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: backup manifests
Date: 2020-03-30 01:05:17
Message-ID: 630506b1-9503-d776-360a-660543797a95@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/29/20 8:47 PM, Robert Haas wrote:
> On Fri, Mar 27, 2020 at 4:02 PM David Steele <david(at)pgmasters(dot)net> wrote:
>> I prefer to validate the size and checksum in the same pass, but I'm not
>> sure it's that big a deal. If the backup is being corrupted under the
>> validate process that would also apply to files that had already been
>> validated.
>
> I did it like this because I thought that in typical scenarios it
> would be likely to produce useful results more quickly. For instance,
> suppose that you forget to restore the tablespace directories, and
> just get the main $PGDATA directory. Well, if you do it all in one
> pass, you might spend a long time checksumming things before you
> realize that some files are completely missing. I thought it would be
> useful to complain about files that are extra or missing or the wrong
> size FIRST, because that only requires us to stat() each file, and
> only after that do the comparatively extensive checksumming step that
> requires us to read the entire contents of each file. Granted, unless
> you use --exit-on-error, you're going to get all the complaints
> eventually anyway, but you might use that option, or you might hit ^C
> when you start to see a slough of complaints poppoing out.

Yeah, that seems reasonable.

In our case backups are nearly always compressed and/or encrypted so
even checking the original size is a bit of work. Getting the checksum
at the same time seems like an obvious win.

Currently we don't have a separate validate command outside of restore
but when we do we'll consider doing a pass to check for file presence
(and size when possible) first. Thanks!

> I wasn't worried about
> concurrent modification of the backup because then you're super-hosed
> no matter what.

Really, really, super-hosed.

Regards,
--
-David
david(at)pgmasters(dot)net

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-03-30 01:07:40 Re: backup manifests
Previous Message Andres Freund 2020-03-30 00:59:19 Re: backup manifests