Re: pg_combinebackup does not detect missing files

From: David Steele <david(at)pgmasters(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_combinebackup does not detect missing files
Date: 2024-04-16 23:25:48
Message-ID: f1a4b02e-f6cd-412d-8ea7-9f3fb3fdcc8b@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/16/24 23:50, Robert Haas wrote:
> On Wed, Apr 10, 2024 at 9:36 PM David Steele <david(at)pgmasters(dot)net> wrote:
>> I've been playing around with the incremental backup feature trying to
>> get a sense of how it can be practically used. One of the first things I
>> always try is to delete random files and see what happens.
>>
>> You can delete pretty much anything you want from the most recent
>> incremental backup (not the manifest) and it will not be detected.
>
> Sure, but you can also delete anything you want from the most recent
> non-incremental backup and it will also not be detected. There's no
> reason at all to hold incremental backup to a higher standard than we
> do in general.

Except that we are running pg_combinebackup on the incremental, which
the user might reasonably expect to check backup integrity. It actually
does a bunch of integrity checks -- but not this one.

>> Maybe the answer here is to update the docs to specify that
>> pg_verifybackup should be run on all backup directories before
>> pg_combinebackup is run. Right now that is not at all clear.
>
> I don't want to make those kinds of prescriptive statements. If you
> want to verify the backups that you use as input to pg_combinebackup,
> you can use pg_verifybackup to do that, but it's not a requirement.
> I'm not averse to having some kind of statement in the documentation
> along the lines of "Note that pg_combinebackup does not attempt to
> verify that the individual backups are intact; for that, use
> pg_verifybackup."

I think we should do this at a minimum.

> But I think it should be blindingly obvious to
> everyone that you can't go whacking around the inputs to a program and
> expect to get perfectly good output. I know it isn't blindingly
> obvious to everyone, which is why I'm not averse to adding something
> like what I just mentioned, and maybe it wouldn't be a bad idea to
> document in a few other places that you shouldn't randomly remove
> files from the data directory of your live cluster, either, because
> people seem to keep doing it, but really, the expectation that you
> can't just blow files away and expect good things to happen afterward
> should hardly need to be stated.

And yet, we see it all the time.

> I think it's very easy to go overboard with warnings of this type.
> Weird stuff comes to me all the time because people call me when the
> weird stuff happens, and I'm guessing that your experience is similar.
> But my actual personal experience, as opposed to the cases reported to
> me by others, practically never features files evaporating into the
> ether.

Same -- if it happens at all it is very rare. Virtually every time I am
able to track down the cause of missing files it is because the user
deleted them, usually to "save space" or because they "did not seem
important".

But given that this occurrence is pretty common in my experience, I
think it is smart to mitigate against it, rather than just take it on
faith that the user hasn't done anything destructive.

Especially given how pg_combinebackup works, backups are going to
undergo a lot of user manipulation (pushing to and pull from storage,
decompressing, untaring, etc.) and I think that means we should take
extra care.

Regards,
-David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2024-04-17 00:21:58 Re: cpluspluscheck/headerscheck require build in REL_16_STABLE
Previous Message Jeff Davis 2024-04-16 23:20:25 Re: post-freeze damage control