Re: backup manifests

From: Noah Misch <noah(at)leadboat(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Tels <nospam-pg-abuse(at)bloodgate(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: backup manifests
Date: 2020-03-30 05:58:54
Message-ID: 20200330055854.GC2324620@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 29, 2020 at 08:42:35PM -0400, Robert Haas wrote:
> On Sat, Mar 28, 2020 at 11:40 PM Noah Misch <noah(at)leadboat(dot)com> wrote:
> > I think this functionality doesn't belong in its own program. If you suspect
> > pg_basebackup or pg_restore will eventually gain the ability to merge
> > incremental backups into a recovery-ready base backup, I would put the
> > functionality in that program. Otherwise, I would put it in pg_checksums.
> > For me, part of the friction here is that the program description indicates
> > general verification, but the actual functionality merely checks hashes on a
> > directory tree that happens to represent a PostgreSQL base backup.
>
> Suraj's original patch made this part of pg_basebackup, but I didn't
> really like that, because I wanted it to have its own set of options.
> I still think all the options I've added are pretty useful ones, and I
> can think of other things somebody might want to do. It feels very
> uncomfortable to make pg_basebackup, or pg_checksums, take either
> options from set A and do thing X, or options from set B and do thing
> Y.

pg_checksums does already have that property, for what it's worth. (More
specifically, certain options dictate the mode, and it reports an error if
another option is incompatible with the mode.)

> But it feels clear that the name pg_validatebackup is not going
> over very well with anyone. I think I should rename it to
> pg_validatemanifest.

Between those two, I would use "pg_validatebackup" if there's a fair chance it
will end up doing the pg_waldump check. Otherwise, I would use
"pg_validatemanifest". I still most prefer delivering this as a mode of an
existing program.

> > > + parse->pathname = palloc(raw_length + 1);
> >
> > I don't see this freed anywhere; is it? (It's useful to make peak memory
> > consumption not grow in proportion to the number of files backed up.)
>
> We need the hash table to remain populated for the whole run time of
> the tool, because we're essentially doing a full join of the actual
> directory contents against the manifest contents. That's a bit
> unfortunate but it doesn't seem simple to improve. I think the only
> people who are really going to suffer are people who have an enormous
> pile of empty or nearly-empty relations. People who have large
> databases for the normal reason - i.e. a reasonable number of tables
> that hold a lot of data - will have manifests of very manageable size.

Okay.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2020-03-30 06:08:27 Re: [HACKERS] WAL logging problem in 9.4.3?
Previous Message Kyotaro Horiguchi 2020-03-30 05:56:11 Re: [HACKERS] WAL logging problem in 9.4.3?