Re: block-level incremental backup

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: block-level incremental backup
Date: 2019-04-18 18:05:40
Message-ID: CA+TgmoaPwi0M6o35sDR7Omw4wroDdtN12uXCg8Zn+kDAqYNNmw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 17, 2019 at 6:43 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> Sadly, I haven't got any great ideas today. I do know that the WAL-G
> folks have specifically mentioned issues with the visibility map being
> large enough across enough of their systems that it kinda sucks to deal
> with. Perhaps we could do something like the rsync binary-diff protocol
> for non-relation files? This is clearly just hand-waving but maybe
> there's something reasonable in that idea.

I guess it all comes down to how complicated you're willing to make
the client-server protocol. With the very simple protocol that I
proposed -- client provides a threshold LSN and server sends blocks
modified since then -- the client need not have access to the old
incremental backup to take a new one. Of course, if it happens to
have access to the old backup then it can delta-compress however it
likes after-the-fact, but that doesn't help with the amount of network
transfer. That problem could be solved by doing something like what
you're talking about (with some probably-negligible false match rate)
but I have no intention of trying to implement anything that
complicated, and I don't really think it's necessary, at least not for
a first version. What I proposed would already allow, for most users,
a large reduction in transfer and storage costs; what you are talking
about here would help more, but also be a lot more work and impose
some additional requirements on the system. I don't object to you
implementing the more complex system, but I'll pass.

> There's something like 6 different backup tools, at least, for
> PostgreSQL that provide backup management, so I have a really hard time
> agreeing with this idea that users don't want a PG backup management
> system. Maybe that's not what you're suggesting here, but that's what
> came across to me.

Let me be a little more clear. Different users want different things.
Some people want a canned PostgreSQL backup solution, while other
people just want access to a reasonable set of facilities from which
they can construct their own solution. I believe that the proposal I
am making here could be used either by backup tool authors to enhance
their offerings, or by individuals who want to build up their own
solution using facilities provided by core.

> Unless maybe I'm misunderstanding and what you're suggesting here is
> that the "existing solution" is something like the external PG-specific
> backup tools? But then the rest doesn't seem to make sense, as only
> maybe one or two of those tools use pg_basebackup internally.

Well, what I'm really talking about is in two pieces: providing some
new facilities via the replication protocol, and making pg_basebackup
able to use those facilities. Nothing would stop other tools from
using those facilities directly if they wish.

> ... but this is exactly the situation we're in already with all of the
> *other* features around backup (parallel backup, backup management, WAL
> management, etc). Users want those features, pg_basebackup/PG core
> doesn't provide it, and therefore there's a bunch of other tools which
> have been written that do. In addition, saying that PG has incremental
> backup but no built-in management of those full-vs-incremental backups
> and telling users that they basically have to build that themselves
> really feels a lot like we're trying to address a check-box requirement
> rather than making something that our users are going to be happy with.

I disagree. Yes, parallel backup, like incremental backup, needs to
go in core. And pg_basebackup should be able to do a parallel backup.
I will fight tooth, nail, and claw any suggestion that the server
should know how to do a parallel backup but pg_basebackup should not
have an option to exploit that capability. And similarly for
incremental.

> I don't think that I was very clear in what my specific concern here
> was. I'm not asking for pg_basebackup to have parallel backup (at
> least, not in this part of the discussion), I'm asking for the
> incremental block-based protocol that's going to be built-in to core to
> be able to be used in a parallel fashion.
>
> The existing protocol that pg_basebackup uses is basically, connect to
> the server and then say "please give me a tarball of the data directory"
> and that is then streamed on that connection, making that protocol
> impossible to use for parallel backup. That's fine as far as it goes
> because only pg_basebackup actually uses that protocol (note that nearly
> all of the other tools for doing backups of PostgreSQL don't...). If
> we're expecting the external tools to use the block-level incremental
> protocol then that protocol really needs to have a way to be
> parallelized, otherwise we're just going to end up with all of the
> individual tools doing their own thing for block-level incremental
> (though perhaps they'd reimplement whatever is done in core but in a way
> that they could parallelize it...), if possible (which I add just in
> case there's some idea that we end up in a situation where the
> block-level incremental backup has to coordinate with the backend in
> some fashion to work... which would mean that *everyone* has to use the
> protocol even if it isn't parallel and that would be really bad, imv).

The obvious way of extending this system to parallel backup is to have
N connections each streaming a separate tarfile such that when you
combine them all you recreate the original data directory. That would
be perfectly compatible with what I'm proposing for incremental
backup. Maybe you have another idea in mind, but I don't know what it
is exactly.

> > Wait, you want to make it maximally easy for users to start the server
> > in a state that is 100% certain to result in a corrupted and unusable
> > database? Why?? I'd l like to make that a tiny bit difficult. If
> > they really want a corrupted database, they can remove the file.
>
> No, I don't want it to be easy for users to start the server in a state
> that's going to result in a corrupted cluster. That's basically the
> complete opposite of what I was going for- having a file that can be
> trivially removed to start up the cluster is *going* to result in people
> having corrupted clusters, no matter how much we tell them "don't do
> that". This is exactly the problem with have with backup_label today.
> I'd really rather not double-down on that.

Well, OK, but short of scanning the entire directory tree on startup,
I don't see how to achieve that.

> There's really two things here- the first is that I agree with the
> concern about potentially destorying the existing backup if the
> pg_basebackup doesn't complete, but there's some ways to address that
> (such as filesystem snapshotting), so I'm not sure that the idea is
> quite that bad, but it would need to be more than just what
> pg_basebackup does in this case in order to be trustworthy (at least,
> for most).

Well, I did mention in my original email that there could be a
combine-backups-destructively option. I guess this is just taking
that to the next level: merge a backup being taken into an existing
backup on-the-fly. Given you remarks above, it is worth noting that
this GREATLY increases the chances of people accidentally causing
corruption in ways that are almost undetectable. All they have to do
is kill -9 the backup tool half way through and then start postgres on
the resulting directory.

> The other part here is the idea of endless incrementals where the blocks
> which don't appear to have changed are never re-validated against what's
> in the backup. Unfortunately, latent corruption happens and you really
> want to have a way to check for that. In past discussions that I've had
> with David, there's been some idea to check some percentage of the
> blocks that didn't appear to change for each backup against what's in
> the backup.

Sure, I'm not trying to block anybody from developing something like
that, and I acknowledge that there is risk in a system like this,
but...

> I share this just to point out that there's some risk to that approach,
> not to say that we shouldn't do it or that we should discourage the
> development of such a feature.

...it seems we are viewing this, at least, from the same perspective.

> Wow. I have to admit that I feel completely opposite of that- I'd
> *love* to have an independent tool (which ideally uses the same code
> through the common library, or similar) that can be run to apply WAL.
>
> In other words, I don't agree that it's the server's problem at all to
> solve that, or, at least, I don't believe that it needs to be.

I mean, I guess I'd love to have that if I could get it by waving a
magic wand, but I wouldn't love it if I had to write the code or
maintain it. The routines for applying WAL currently all assume that
you have a whole bunch of server infrastructure present; that code
wouldn't run in a frontend environment, I think. I wouldn't want to
have a second copy of every WAL apply routine that might have its own
set of bugs.

> I've tried to outline how the incremental backup capability and backup
> management are really very closely related and having those be
> implemented by independent tools is not a good interface for our users
> to have to live with.

I disagree. I think the "existing backup tools don't use
pg_basebackup" argument isn't very compelling, because the reason
those tools don't use pg_basebackup is because it can't do what they
need. If it did, they'd probably use it. People don't write a whole
separate engine for running backups just because it's fun to not reuse
code -- they do it because there's no other way to get what they want.

> Most of the external tools don't use pg_basebackup, nor the base backup
> protocol (or, if they do, it's only as an option among others). In my
> opinion, that's pretty clear indication that pg_basebackup and the base
> backup protocol aren't sufficient to cover any but the simplest of
> use-cases (though those simple use-cases are handled rather well).
> We're talking about adding on a capability that's much more complicated
> and is one that a lot of tools have already taken a stab at, let's try
> to do it in a way that those tools can leverage it and avoid having to
> implement it themselves.

I mean, again, if it were part of pg_basebackup and available via the
replication protocol, they could do exactly that, through either
method. I don't get it. You seem to be arguing that we shouldn't add
the necessary capabilities to the replication protocol or
pg_basebackup, but at the same time arguing that pg_basebackup is
inadequate because it's missing important capabilities. This confuses
me.

> It's an interesting idea to add in everything to pg_basebackup that
> users doing backups would like to see, but that's quite a list:
>
> - full backups
> - differential backups
> - incremental backups / block-level backups
> - (server-side) compression
> - (server-side) encryption
> - page-level checksum validation
> - calculating checksums (on the whole file)
> - External object storage (S3, et al)
> - more things...
>
> I'm really not convinced that I agree with the division of labor as
> you've outlined it, where all of the above is done by pg_basebackup,
> where just archiving and backup retention are handled by some external
> tool (except that we already have pg_receivewal, so archiving isn't
> really an externally handled thing either, unless you want features like
> parallel archive-push or parallel archive-get...).

Yeah, if it were up to me, I'd choose put most of that in the server
and make it available via the replication protocol, and then give
pg_basebackup able to use that functionality. And external tools
could use that functionality via pg_basebackup or by using the
replication protocol directly. I actually don't really understand
what the alternative is. If you want server-side compression, for
example, that really has to be done on the server. And how would the
server expose that, except through the replication protocol? Sure, we
could design a new protocol for it. Call it... say... the
shmeplication protocol. And then you could use the replication
protocol for what it does today and the shmeplication protocol for all
the cool bits. But why would that be better?

> What would really help me, at least, understand the idea here would be
> to understand exactly what the existing tools do that the subset of
> users you're thinking about doesn't like/want, but which pg_basebackup,
> today, does. Is the issue that there's a repository instead of just a
> plain PG directory or set of tar files, like what pg_basebackup produces
> today? But how would we do things like have compression, or encryption,
> or block-based incremental backups without some kind of repository or
> directory that doesn't actually look exactly like a PG data directory?

I guess we're still wallowing in the same confusion here.
pg_basebackup, for me, is just a convenient place to stick this
functionality. If the server has the ability to construct and send an
incremental backup by some means, then it needs a client on the other
end to receive and store that backup, and since pg_basebackup already
knows how to do that for full backups, extending it to incremental
backups (and/or parallel, encrypted, compressed, and validated
backups) seems very natural to me. Otherwise I add server-side
functionality to allow $X and then have to write an entirely new
client to interact with that instead of just using the client I've
already got. That's more work, and I'm lazy.

Now it's true that if we wanted to build something like the rsync
protocol into PostgreSQL, jamming that into pg_basebackup might well
be a bridge too far. That would involve taking backups via a method
so different from what we're currently doing that it would probably
make sense to at least consider creating a whole new tool for that
purpose. But that wasn't my proposal...

> I certainly can understand that there are PostgreSQL users who want to
> leverage incremental backups without having to use BART or another tool
> outside of whatever enterprise backup system they've got, but surely
> that's a large pool of users who *do* want a PG backup tool that manages
> backups, or you wouldn't have spent a considerable amount of your very
> valuable time hacking on BART. I've certainly seen a fair share of both
> and I don't think we should set out to exclude either.

Sure, I agree.

> Perhaps that's what we're both saying too and just talking past each
> other, but I feel like the approach here is "make it work just for the
> simple pg_basebackup case and not worry too much about the other tools,
> since what we do for pg_basebackup will work for them too" while where
> I'm coming from is "focus on what the other tools need first, and then
> make pg_basebackup work with that if there's a sensible way to do so."

I think perhaps the disconnect is that I just don't see how it can
fail to work for the external tools if it works for pg_basebackup.
Any given piece of functionality is either available in the
replication stream, or it's not. I suspect that for both BART and
pg_backrest, they won't be able to completely give up on having their
own backup engines solely because core has incremental backup, but I
don't know what the alternative to adding features to core one at a
time is.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-04-18 18:13:56 Re: Runtime pruning problem
Previous Message Tom Lane 2019-04-18 17:25:52 Re: Runtime pruning problem