Re: backup manifests

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: David Steele <david(at)pgmasters(dot)net>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: backup manifests
Date: 2019-09-19 13:51:11
Message-ID: CA+TgmobANXUCZ1XrLJQbX95e3fJa0dRqV09Scjb2kmqt+gdkvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 18, 2019 at 9:11 PM David Steele <david(at)pgmasters(dot)net> wrote:
> Also consider adding the timestamp.

Sounds reasonable, even if only for the benefit of humans who might
look at the file. We can decide later whether to use it for anything
else (and third-party tools could make different decisions from core).
I assume we're talking about file mtime here, not file ctime or file
atime or the time the manifest was generated, but let me know if I'm
wrong.

> Consider adding a reference to each file that specifies where the file
> can be found in if it is not in this backup. As I understand the
> pg_basebackup proposal, it would only be implementing differential
> backups, i.e. an incremental that is *only* based on the last full
> backup. So, the reference can be inferred in this case. However, if
> the user selects the wrong full backup on restore, and we have labeled
> each backup, then a differential restore with references against the
> wrong full backup would result in a hard error rather than corruption.

I intend that we should be able to support incremental backups based
either on a previous full backup or based on a previous incremental
backup. I am not aware of a technical reason why we need to identify
the specific backup that must be used. If incremental backup B is
taken based on a pre-existing backup A, then I think that B can be
restored using either A or *any other backup taken after A and before
B*. In the normal case, there probably wouldn't be any such backup,
but AFAICS the start-LSNs are a sufficient cross-check that the chosen
base backup is legal.

> Based on my original calculations (which sadly I don't have anymore),
> the combination of SHA1, size, and file name is *extremely* unlikely to
> generate a collision. As in, unlikely to happen before the end of the
> universe kind of unlikely. Though, I guess it depends on your
> expectations for the lifetime of the universe.

Somebody once said that we should be prepared for it to end at an any
time, or not, and that the time at which it actually was due to end
would not be disclosed in advance. This is probably good life advice
which I ought to take more frequently than I do, but I think we can
finesse the issue for purposes of this discussion. What I'd say is: if
the probability of getting a collision is demonstrably many orders of
magnitude less than the probability of the disk writing the block
incorrectly, then I think we're probably reasonably OK. Somebody might
differ, which is perhaps a mild point in favor of LSN-based
approaches, but as a practical matter, if a bad block is a billion
times more likely to be the result of a disk error than a checksum
mismatch, then it's a negligible risk.

> And maybe a few other bits of metadata, but I'm not sure
> > exactly what. Ideas?
>
> A backup label for sure. You can also use this as the directory/tar
> name to save the user coming up with one. We use YYYYMMDDHH24MMSSF for
> full backups and YYYYMMDDHH24MMSSF_YYYYMMDDHH24MMSS(D|I) for
> incrementals and have logic to prevent two backups from having the same
> label. This is unlikely outside of testing but still a good idea.
>
> Knowing the start/stop time of the backup is useful in all kinds of
> ways, especially monitoring and time-targeted PITR. Start/stop LSN is
> also good. I know this is also in backup_label but having it all in one
> place is nice.
>
> We include the version/sysid of the cluster to avoid mixups. It's a
> great extra check on top of references to be sure everything is kosher.

I don't think it's a good idea to duplicate the information that's
already in the backup_label. Storing two copies of the same
information is just an invitation to having to worry about what
happens if they don't agree.

> A manifest version is good in case we change the format later.

Yeah.

> I'd
> recommend JSON for the format since it is so ubiquitous and easily
> handles escaping which can be gotchas in a home-grown format. We
> currently have a format that is a combination of Windows INI and JSON
> (for human-readability in theory) and we have become painfully aware of
> escaping issues. Really, why would you drop files with '=' in their
> name in PGDATA? And yet it happens.

I am not crazy about JSON because it requires that I get a json parser
into src/common, which I could do, but given the possibly-imminent end
of the universe, I'm not sure it's the greatest use of time. You're
right that if we pick an ad-hoc format, we've got to worry about
escaping, which isn't lovely.

> > (1) When taking a backup, have the option (perhaps enabled by default)
> > to include a backup manifest.
>
> Manifests are cheap to builds so I wouldn't make it an option.

Huh. That's an interesting idea. Thanks.

> > (3) Cross-check a manifest against a backup and complain about extra
> > files, missing files, size differences, or checksum mismatches.
>
> Verification is the best part of the manifest. Plus, you can do
> verification pretty cheaply on restore. We also restore pg_control last
> so clusters that have a restore error won't start.

There's no "restore" operation here, really. A backup taken by
pg_basebackup can be "restored" by copying the whole thing, but it can
also be used just where it is. If we were going to build something
into some in-core tool to copy backups around, this would be a smart
way to implement said tool, but I'm not planning on that myself.

> > One thing I'm not quite sure about is where to store the backup
> > manifest. If you take a base backup in tar format, you get base.tar,
> > pg_wal.tar (unless -Xnone), and an additional tar file per tablespace.
> > Does the backup manifest go into base.tar? Get written into a separate
> > file outside of any tar archive? Something else? And what about a
> > plain-format backup? I suppose then we should just write the manifest
> > into the top level of the main data directory, but perhaps someone has
> > another idea.
>
> We do:
>
> [backup_label]/
> backup.manifest
> pg_data/
> pg_tblspc/
>
> In general, having the manifest easily accessible is ideal.

That's a fine choice for a tool, but a I'm talking about something
that is part of the actual backup format supported by PostgreSQL, not
what a tool might wrap around it. The choice is whether, for a
tar-format backup, the manifest goes inside a tar file or as a
separate file. To put that another way, a patch adding backup
manifests does not get to redesign where pg_basebackup puts anything
else; it only gets to decide where to put the manifest.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexey Kondratov 2019-09-19 14:40:41 Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
Previous Message Robert Haas 2019-09-19 13:21:23 Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly