Re: block-level incremental backup

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: block-level incremental backup
Date: 2019-04-18 16:56:10
Message-ID: CA+TgmoavP90mOQ-VF7BGd7JRUd2yez9R46Q6GgJJfKK_4K0rrg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 17, 2019 at 5:20 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> As I understand it, the problem is not with backing up an individual
> database or cluster, but rather dealing with backing up thousands of
> individual clusters with thousands of tables in each, leading to an
> awful lot of tables with lots of FSMs/VMs, all of which end up having to
> get copied and stored wholesale. I'll point this thread out to him and
> hopefully he'll have a chance to share more specific information.

Sounds good.

> I can agree with the idea of having multiple options for how to collect
> up the set of changed blocks, though I continue to feel that a
> WAL-scanning approach isn't something that we'd have implemented in the
> backend at all since it doesn't require the backend and a given backend
> might not even have all of the WAL that is relevant. I certainly don't
> think it makes sense to have a backend go get WAL from the archive to
> then merge the WAL to provide the result to a client asking for it-
> that's adding entirely unnecessary load to the database server.

My motivation for wanting to include it in the database server was twofold:

1. I was hoping to leverage the background worker machinery. The
WAL-scanner would just run all the time in the background, and start
up and shut down along with the server. If it's a standalone tool,
then it can run on a different server or when the server is down, both
of which are nice. The downside though is that now you probably have
to put it in crontab or under systemd or something, instead of just
setting a couple of GUCs and letting the server handle the rest. For
me that downside seems rather significant, but YMMV.

2. In order for the information produced by the WAL-scanner to be
useful, it's got to be available to the server when the server is
asked for an incremental backup. If the information is constructed by
a standalone frontend tool, and stored someplace other than under
$PGDATA, then the server won't have convenient access to it. I guess
we could make it the client's job to provide that information to the
server, but I kind of liked the simplicity of not needing to give the
server anything more than an LSN.

> A thought that occurs to me is to have the functions for supporting the
> WAL merging be included in libcommon and available to both the
> independent executable that's available for doing WAL merging, and to
> the backend to be able to WAL merging itself-

Yeah, that might be possible.

> but for a specific
> purpose: having a way to reduce the amount of WAL that needs to be sent
> to a replica which has a replication slot but that's been disconnected
> for a while. Of course, there'd have to be some way to handle the other
> files for that to work to update a long out-of-date replica. Now, if we
> taught the backup tool about having a replication slot then perhaps we
> could have the backend effectively have the same capability proposed
> above, but without the need to go get the WAL from the archive
> repository.

Hmm, but you can't just skip over WAL records or segments because
there are checksums and previous-record pointers and things....

> I'm still not entirely sure that this makes sense to do in the backend
> due to the additional load, this is really just some brainstorming.

Would it really be that much load?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-04-18 17:00:53 Re: block-level incremental backup
Previous Message Pavel Stehule 2019-04-18 16:41:15 Re: proposal: psql PSQL_TABULAR_PAGER variable