Re: block-level incremental backup

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: block-level incremental backup
Date: 2019-09-16 14:38:17
Message-ID: 20190916143817.GA6962@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> On Mon, Sep 16, 2019 at 4:31 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > Can we think of using creation time for file? Basically, if the file
> > creation time is later than backup-labels "START TIME:", then include
> > that file entirely. I think one big point against this is clock skew
> > like what if somebody tinkers with the clock. And also, this can
> > cover cases like
> > what Jeevan has pointed but might not cover other cases which we found
> > problematic.
>
> Well that would mean, for example, that if you copied the data
> directory from one machine to another, the next "incremental" backup
> would turn into a full backup. That sucks. And in other situations,
> like resetting the clock, it could mean that you end up with a corrupt
> backup without any real ability for PostgreSQL to detect it. I'm not
> saying that it is impossible to create a practically useful system
> based on file time stamps, but I really don't like it.

In a number of cases, trying to make sure that on a failover or copy of
the backup the next 'incremental' is really an 'incremental' is
dangerous. A better strategy to address this, and the other issues
realized on this thread recently, is to:

- Have a manifest of every file in each backup
- Always back up new files that weren't in the prior backup
- Keep a checksum of each file
- Track the timestamp of each file as of when it was backed up
- Track the file size of each file
- Track the starting timestamp of each backup
- Always include files with a modification time after the starting
timestamp of the prior backup, or if the file size has changed
- In the event of any anomolies (which includes things like a timeline
switch), use checksum matching (aka 'delta checksum backup') to
perform the backup instead of using timestamps (or just always do that
if you want to be particularly careful- having an option for it is
great)
- Probably other things I'm not thinking of off-hand, but this is at
least a good start. Make sure to checksum this information too.

I agree entirely that it is dangerous to simply rely on creation time as
compared to some other time, or to rely on modification time of a given
file across multiple backups (which has been shown to reliably cause
corruption, at least with rsync and its 1-second granularity on
modification time).

By having a manifest for each backed up file for each backup, you also
gain the ability to validate that a backup in the repository hasn't been
corrupted post-backup, a feature that at least some other database
backup and restore systems have (referring specifically to the big O in
this particular case, but I bet others do too).

Having a system of keeping track of which backups are full and which are
differential in an overall system also gives you the ability to do
things like expiration in a sensible way, including handling WAL
expiration.

As also mentioned up-thread, this likely also allows you to have a
simpler approach to parallelizing the overall backup.

I'd like to clarify that while I would like to have an easier way to
parallelize backups, that's a relatively minor complaint- the much
bigger issue that I have with this feature is that trying to address
everything correctly while having only the amount of information that
could be passed on the command-line about the prior full/incremental is
going to be extremely difficult, complicated, and likely to lead to
subtle bugs in the actual code, and probably less than subtle bugs in
how users end up using it, since they'll have to implement the
expiration and tracking of information between backups themselves
(unless something's changed in that part during this discussion- I admit
that I've not read every email in this thread).

> > One related point is how do incremental backups handle the case where
> > vacuum truncates the relation partially? Basically, with current
> > patch/design, it doesn't appear that such information can be passed
> > via incremental backup. I am not sure if this is a problem, but it
> > would be good if we can somehow handle this.
>
> As to this, if you're taking a full backup of a particular file,
> there's no problem. If you're taking a partial backup of a particular
> file, you need to include the current length of the file and the
> identity and contents of each modified block. Then you're fine.

I would also expect this to be fine but if there's an example of where
this is an issue, please share. The only issue that I can think of
off-hand is orphaned-file risk, whereby you have something like CREATE
DATABASE or perhaps ALTER TABLE .. SET TABLESPACE or such, take a
backup while that's happening, but that doesn't complete during the
backup (or recovery, or perhaps even in some other scenarios, it's
unfortunately quite complicated). This orphaned file risk isn't newly
discovered but fixing it is pretty complicated- would love to discuss
ideas around how to handle it.

> > Isn't some operations where at the end we directly call heap_sync
> > without writing WAL will have a similar problem as well?
>
> Maybe. Can you give an example?

I'd be curious to hear what the concern is here also.

> > Similarly,
> > it is not very clear if unlogged relations are handled in some way if
> > not, the same could be documented.
>
> I think that we don't need to back up the contents of unlogged
> relations at all, right? Restoration from an online backup always
> involves running recovery, and so unlogged relations will anyway get
> zapped.

Unlogged relations shouldn't be in the backup at all, since, yes, they
get zapped at the start of recovery. We recently taught pg_basebackup
how to avoid backing them up so this shouldn't be an issue, as they
should be skipped for incrementals as well as fulls. I expect the
orphaned file problem also exists for UNLOGGED->LOGGED transitions.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jonathan S. Katz 2019-09-16 14:55:17 Re: Define jsonpath functions as stable
Previous Message Alvaro Herrera 2019-09-16 14:24:46 Re: Commit fest 2019-09