Quick Links

Re: block-level incremental backup

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: block-level incremental backup
Date:	2019-09-17 16:09:08
Message-ID:	20190917160908.GH6962@tamriel.snowman.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Greetings,

* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> On Mon, Sep 16, 2019 at 3:38 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > As discussed nearby, not everything that needs to be included in the
> > backup is actually going to be in the WAL though, right? How would that
> > ever be able to handle the case where someone starts the server under
> > wal_level = logical, takes a full backup, then restarts with wal_level =
> > minimal, writes out a bunch of new data, and then restarts back to
> > wal_level = logical and takes an incremental?
>
> Fair point. I think the WAL-scanning approach can only work if
> wal_level > minimal. But, I also think that few people run with
> wal_level = minimal in this era where the default has been changed to
> replica; and I think we can detect the WAL level in use while scanning
> WAL. It can only change at a checkpoint.

We need to be sure that we can detect if the WAL level has ever been set
to minimal between a full and an incremental and, if so, either refuse
to run the incremental, or promote it to a full, or make it a
checksum-based incremental instead of trusting the WAL stream.

I'm also glad that we ended up changing the default though and I do hope
that there's relatively few people running with minimal and that there's
even fewer who play around with flipping it back and forth.

> > On larger systems, so many of the files are 1GB in size that checking
> > the file size is quite close to meaningless. Yes, having to checksum
> > all of the files definitely adds to the cost of taking the backup, but
> > to avoid it we need strong assurances that a given file hasn't been
> > changed since our last full backup. WAL, today at least, isn't quite
> > that, and timestamps can possibly be fooled with, so if you'd like to be
> > particularly careful, there doesn't seem to be a lot of alternatives.
>
> I see your points, but it feels like you're trying to talk down the
> WAL-based approach over what seem to me to be fairly manageable corner
> cases.

Just to be clear, I see your points and I like the general idea of
finding solutions, but it seems like the issues are likely to be pretty
complex and I'm not sure that's being appreciated very well.

> > I'm not asking you to be an expert on those systems, just to help me
> > understand the statements you're making. How is backing up to a
> > pgbackrest repo different than running a pg_basebackup in the context of
> > using some other Enterprise backup system? In both cases, you'll have a
> > full copy of the backup (presumably compressed) somewhere out on a disk
> > or filesystem which is then backed up by the Enterprise tool.
>
> Well, I think that what people really want is to be able to backup
> straight into the enterprise tool, without an intermediate step.

Ok.. I can understand that but I don't get how these changes to
pg_basebackup will help facilitate that. If they don't and what you're
talking about here is independent, then great, that clarifies things,
but if you're saying that these changes to pg_basebackup are to help
with backing up directly into those Enterprise systems then I'm just
asking for some help in understanding how- what's the use-case here that
we're adding to pg_basebackup that makes it work with these Enterprise
systems?

I'm not trying to be difficult here, I'm just trying to understand.

> My basic point here is: As with practically all PostgreSQL
> development, I think we should try to expose capabilities and avoid
> making policy on behalf of users.
>
> I'm not objecting to the idea of having tools that can help users
> figure out how much WAL they need to retain -- but insofar as we can
> do it, such tools should work regardless of where that WAL is actually
> stored.

How would that tool work, if it's to be able to work regardless of where
the WAL is actually stored..? Today, pg_archivecleanup just works
against a POSIX filesystem- are you thinking that the tool would have a
pluggable storage system, so that it could work with, say, a POSIX
filesystem, or a CIFS mount, or a s3-like system?

> I dislike the idea that PostgreSQL would provide something
> akin to a "pgbackrest repository" in core, or I at least I think it
> would be important that we're careful about how much functionality
> gets tied to the presence and use of such a thing, because, at least
> based on my experience working at EnterpriseDB, larger customers often
> don't want to do it that way.

This seems largely independent of the above discussion, but since we're
discussing it, I've certainly had various experiences in this area too-
some larger customers would like to use an s3-like store (which
pgbackrest already supports and will be supporting others going forward
as it has a pluggable storage mechanism for the repo...), and then
there's customers who would like to point their Enterprise backup
solution at a directory on disk to back it up (which pgbackrest also
supports, as mentioned previously), and lastly there's customers who
really want to just backup the PG data directory and they'd like it to
"just work", thank you, and no they don't have any thought or concern
about how to handle WAL, but surely it can't be that important, can it?

The last is tongue-in-cheek and I'm half-kidding there, but this is why
I was trying to understand the comments above about what the use-case is
here that we're trying to solve for that answers the call for the
Enterprise software crowd, and ideally what distinguishes that from
pgbackrest, but just the clear cut "this is what this change will do to
make pg_basebackup work for Enterprise customers" would be great, or
even a "well, pg_basebackup today works for them because it does X and
it'll continue to be able to do X even after this change."

I'll take a wild shot in the dark to try to help move us through this-
is it that pg_basebackup can stream out to stdout in some cases..?
Though that's quite limited since it means you can't have additional
tablespaces and you can't stream the WAL, and how would that work with
the manifest idea that's being discussed..? If there's a directory
that's got manifest files in it for each backup, so we have the file
sizes for them, those would need to be accessible when we go to do the
incremental backup and couldn't be stored off somewhere else, I wouldn't
think..

> > That's not great, of course, which is why there are trade-offs to be
> > made, one of which typically involves using timestamps, but doing so
> > quite carefully, to perform the file exclusion. Other ideas are great
> > but it seems like WAL isn't really a great idea unless we make some
> > changes there and we, as in PG, haven't got a robust "we know this file
> > changed as of this point" to work from. I worry that we're putting too
> > much faith into a system to do something independent of what it was
> > actually built and designed to do, and thinking that because we could
> > trust it for X, we can trust it for Y.
>
> That seems like a considerable overreaction to me based on the
> problems reported thus far. The fact is, WAL was originally intended
> for crash recovery and has subsequently been generalized to be usable
> for point-in-time recovery, standby servers, and logical decoding.
> It's clearly established at this point as the canonical way that you
> know what in the database has changed, which is the same need that we
> have for incremental backup.

Provided the WAL level is at the level that you need it to be that will
be true for things which are actually supported with PITR, replication
to standby servers, et al. I can see how it might come across as an
overreaction but this strikes me as a pretty glaring issue and I worry
that if it was overlooked until now that there'll be other more subtle
issues, and backups are just plain complicated to get right, just to
begin with already, something that I don't think people appreciate until
they've been dealing with them for quite a while.

Not that this would be the first time we've had issues in this area, and
we'd likely work through them over time, but I'm sure we'd all prefer to
get it as close to right as possible the first time around, and that's
going to require some pretty in depth review.

> At any rate, the same criticism can be leveled - IMHO with a lot more
> validity - at timestamps. Last-modification timestamps are completely
> outside of our control; they are owned by the OS and various operating
> systems can and do have varying behavior. They can go backwards when
> things have changed; they can go forwards when things have not
> changed. They were clearly not intended to meet this kind of
> requirement. Even, they were intended for that purpose much less so
> than WAL, which was actually designed for a requirement in this
> general ballpark, if not this thing precisely.

While I understand that timestamps may be used for a lot of things and
that the time on a system could go forward or backward, the actual
requirement is:

- If the file was modified after the backup was done, the timestamp (or
the size) needs to be different. Doesn't actually matter if it's
forwards, or backwards, different is all that's needed. The timestamp
also needs to be before the backup started for it to be considered an
option to skip it.

Is it possible for that to be fool'd? Yes, of course, but it isn't as
simply fooled as your typical "just copy files newer than X" issue that
other tools have, at least, if you're keeping a manifest of all of the
files, et al, as discussed earlier.

Thanks,

Stephen

In response to

Re: block-level incremental backup at 2019-09-17 14:55:04 from Robert Haas

Responses

Re: block-level incremental backup at 2019-09-17 16:58:23 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alvaro Herrera	2019-09-17 16:36:36	Re: [PATCH][PROPOSAL] Add enum releation option type
Previous Message	Erik Rijkers	2019-09-17 16:09:01	Re: Define jsonpath functions as stable