Re: block-level incremental backup

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: block-level incremental backup
Date: 2019-04-20 20:11:11
Message-ID: CA+Tgmoa7W=s+1D2WP9+3=7TNon7UWkZuQ3yKCD9wBR93xsmcMA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 20, 2019 at 12:19 AM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> * Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> > What I'm NOT willing to
> > do is build a whole bunch of infrastructure that will help pgbackrest
> > do amazing things but will not provide a simple and convenient way of
> > taking incremental backups using only core tools. I do care about
> > having something that's good for pgbackrest and other out-of-core
> > tools. I just care about it MUCH LESS than I care about making
> > PostgreSQL core awesome.
>
> Then I misunderstood your original proposal where you talked about
> providing something that the various external tools could use. If you'd
> like to *just* provide a mechanism for pg_basebackup to be able to do a
> trivial incremental backup, great, but it's not going to be useful or
> used by the external tools, just like the existing base backup protocol
> isn't used by the external tools because it can't be used in a parallel
> fashion.

Well, what I meant - and perhaps I wasn't clear enough about this - is
that it could be used by an external solution for *managing* backups,
not so much an external engine for *taking* backups. But actually, I
really don't see any reason why the latter wouldn't also be possible.
It was already suggested upthread by Anastasia that there should be a
way to ask the server to give only the identity of the modified blocks
without the contents of those blocks; if we provide that, then a tool
can get those and do whatever it likes with them, including fetching
them in parallel by some other means. Another obvious extension would
be to add a command that says 'give me this file' or 'give me this
file but only this list of blocks' which would give clients lots of
options: they could provide their own lists of blocks to fetch
computed by whatever internal magic they have, or they could request
the server's modified-block map information first and then schedule
fetching those blocks in parallel using this new command. So it seems
like with some pretty straightforward extensions this can be made
usable by and valuable to people wanting to build external backup
engines, too. I do not necessarily feel obliged to implement every
feature that might help with that kind of thing just because I've
expressed an interest in this general area, but I might do some of
them, and maybe people like you or Anastasia who want to make these
facilities available to external tools can help with some of the work,
too.

That being said, as long as there is significant demand for
value-added backup features over and above what is in core, there are
probably going to be non-core backup tools that do things their own
way instead of just leaning on whatever the server provides natively.
In a certain sense that's regrettable, because it means that somebody
- or perhaps multiple somebodys - goes to the trouble of doing
something outside core and then somebody else puts something in core
that obsoletes it and therein lies duplication of effort. On the
other hand, it also allows people to innovate way faster than can be
done in core, it allows competition among different possible designs,
and it's just kinda the way we roll around here. I can't get very
worked up about it.

One thing I'm definitely not going to do here is abandon my goal of
producing a *simple* incremental backup solution that can be deployed
*easily* by users. I understand from your remarks that such a solution
will not suit everybody. However, unlike you, I do not believe that
pg_basebackup was a failure. I certainly agree that it has some
limitations that mean that it is hard to use in large deployments, but
it's also *extremely* convenient for people with a fairly small
database when they just need a quick and easy backup. Adding some
more features to it - such as incremental backup - will make it useful
to more people in more cases. There will doubtless still be people
who need more, and that's OK: those people can use a third-party tool.
I will not get anywhere trying to solve every problem at once.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-04-20 20:13:42 Re: block-level incremental backup
Previous Message Tom Lane 2019-04-20 18:46:12 Re: TM format can mix encodings in to_char()