Quick Links

Re: block-level incremental backup

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: block-level incremental backup
Date:	2019-04-18 20:59:12
Message-ID:	20190418205912.GI6197@tamriel.snowman.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Greetings,

* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> On Wed, Apr 17, 2019 at 5:20 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > As I understand it, the problem is not with backing up an individual
> > database or cluster, but rather dealing with backing up thousands of
> > individual clusters with thousands of tables in each, leading to an
> > awful lot of tables with lots of FSMs/VMs, all of which end up having to
> > get copied and stored wholesale. I'll point this thread out to him and
> > hopefully he'll have a chance to share more specific information.
>
> Sounds good.

Ok, done.

> > I can agree with the idea of having multiple options for how to collect
> > up the set of changed blocks, though I continue to feel that a
> > WAL-scanning approach isn't something that we'd have implemented in the
> > backend at all since it doesn't require the backend and a given backend
> > might not even have all of the WAL that is relevant. I certainly don't
> > think it makes sense to have a backend go get WAL from the archive to
> > then merge the WAL to provide the result to a client asking for it-
> > that's adding entirely unnecessary load to the database server.
>
> My motivation for wanting to include it in the database server was twofold:
>
> 1. I was hoping to leverage the background worker machinery. The
> WAL-scanner would just run all the time in the background, and start
> up and shut down along with the server. If it's a standalone tool,
> then it can run on a different server or when the server is down, both
> of which are nice. The downside though is that now you probably have
> to put it in crontab or under systemd or something, instead of just
> setting a couple of GUCs and letting the server handle the rest. For
> me that downside seems rather significant, but YMMV.

Background workers can be used to do pretty much anything. I'm not
suggesting that's a bad thing- just that it's such a completely generic
tool that could be used to put anything/everything into the backend, so
I'm not sure how much it makes sense as an argument when it comes to
designing a new capability/feature. Yes, there's an advantage there
when it comes to configuration since that means we don't need to set up
a cronjob and can, instead, just set a few GUCs... but it also means
that it *must* be done on the server and there's no option to do it
elsewhere, as you say.

When it comes to "this is something that I can do on the DB server or on
some other server", the usual preference is to use another system for
it, to reduce load on the server.

If it comes down to something that needs to/should be an ongoing
process, then the packaging can package that as a daemon-type tool which
handles the systemd component to it, assuming the stand-alone tool
supports that, which it hopefully would.

> 2. In order for the information produced by the WAL-scanner to be
> useful, it's got to be available to the server when the server is
> asked for an incremental backup. If the information is constructed by
> a standalone frontend tool, and stored someplace other than under
> $PGDATA, then the server won't have convenient access to it. I guess
> we could make it the client's job to provide that information to the
> server, but I kind of liked the simplicity of not needing to give the
> server anything more than an LSN.

If the WAL-scanner tool is a stand-alone tool, and it handles picking
out all of the FPIs and incremental page changes for each relation, then
what does the tool to build out the "new" backup really need to tell the
backend? I feel like it mainly needs to ask the backend for the
non-relation files, which gets into at least one approach that I've
thought about for redesigning the backup protocol:

1. Ask for a list of files and metadata about them
2. Allow asking for individual files
3. Support multiple connections asking for individual files

Quite a few of the existing backup tools for PG use a model along these
lines (or use tools underneath which do).

> > A thought that occurs to me is to have the functions for supporting the
> > WAL merging be included in libcommon and available to both the
> > independent executable that's available for doing WAL merging, and to
> > the backend to be able to WAL merging itself-
>
> Yeah, that might be possible.

I feel like this would be necessary, as it's certainly delicate and
critical code and having multiple implementations of it will be
difficult to manage.

That said... we already have independent work going on to do WAL
mergeing (WAL-G, at least), and if we insist that the WAL replay code
only exists in the backend, I strongly suspect we'll end up with
independent implementations of that too. Sure, we can distance
ourselves from that and say that we don't have to deal with any bugs
from it... but it seems like the better approach would be to have a
common library that provides it.

> > but for a specific
> > purpose: having a way to reduce the amount of WAL that needs to be sent
> > to a replica which has a replication slot but that's been disconnected
> > for a while. Of course, there'd have to be some way to handle the other
> > files for that to work to update a long out-of-date replica. Now, if we
> > taught the backup tool about having a replication slot then perhaps we
> > could have the backend effectively have the same capability proposed
> > above, but without the need to go get the WAL from the archive
> > repository.
>
> Hmm, but you can't just skip over WAL records or segments because
> there are checksums and previous-record pointers and things....

Those aren't what I would be worried about, I'd think? Maybe we're
talking about different things, but if there's a way to scan/compress
WAL so that we have less work to do when replaying, then we should
leverage that for replicas that have been disconnected for a while too.

One important bit here is that the replica wouldn't be able to answer
queries while it's working through this compressed WAL, since it
wouldn't reach a consistent state until more-or-less the end of WAL, but
I am not sure that's a bad thing; who wants to get responses back from a
very out-of-date replica?

> > I'm still not entirely sure that this makes sense to do in the backend
> > due to the additional load, this is really just some brainstorming.
>
> Would it really be that much load?

Well, it'd clearly be more than zero. There may be an argument to be
made that it's worth it to reduce the overall throughput of the system
in order to add this capability, but I don't think we've got enough
information at this point to know. My gut feeling, at least, is that
tracking enough information to do WAL-compression on a high-write system
is going to be pretty expensive as you'd need to have a data structure
that makes it easy to identify every page in the system, and be able to
find each of them later on in the stream, and then throw away the old
FPI in favor of the new one, and then track all the incremental page
updates to that page, more-or-less, right?

On a large system, given how much information has to be tracked, it
seems like it could be a fair bit of load, but perhaps you've got some
ideas as to how to reduce it..?

Thanks!

Stephen

In response to

Re: block-level incremental backup at 2019-04-18 16:56:10 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2019-04-18 21:05:07	Re: Pluggable Storage - Andres's take
Previous Message	Robert Haas	2019-04-18 20:25:24	Re: finding changed blocks using WAL scanning