Re: block-level incremental backup

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: block-level incremental backup
Date: 2019-04-21 23:02:26
Message-ID: CA+TgmobZfdfdmL2m-QcY++LERZYtgqaomeEAapbdHiooJycKWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 20, 2019 at 4:32 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> Having been around for a while working on backup-related things, if I
> was to implement the protocol for pg_basebackup today, I'd definitely
> implement "give me a list" and "give me this file" rather than the
> tar-based approach, because I've learned that people want to be
> able to do parallel backups and that's a decent way to do that. I
> wouldn't set out and implement something new that's there's just no hope
> of making parallel. Maybe the first write of pg_basebackup would still
> be simple and serial since it's certainly more work to make a frontend
> tool like that work in parallel, but at least the protocol would be
> ready to support a parallel option being added alter without being
> rewritten.
>
> And that's really what I was trying to get at here- if we've got the
> choice now to decide what this is going to look like from a protocol
> level, it'd be great if we could make it able to support being used in a
> parallel fashion, even if pg_basebackup is still single-threaded.

I think we're getting closer to a meeting of the minds here, but I
don't think it's intrinsically necessary to rewrite the whole method
of operation of pg_basebackup to implement incremental backup in a
sensible way. One could instead just do a straightforward extension
to the existing BASE_BACKUP command to enable incremental backup.
Then, to enable parallel full backup and all sorts of out-of-core
hacking, one could expand the command language to allow tools to
access individual steps: START_BACKUP, SEND_FILE_LIST,
SEND_FILE_CONTENTS, STOP_BACKUP, or whatever. The second thing makes
for an appealing project, but I do not think there is a technical
reason why it has to be done first. Or for that matter why it has to
be done second. As I keep saying, incremental backup and full backup
are separate projects and I believe it's completely reasonable for
whoever is doing the work to decide on the order in which they would
like to do the work.

Having said that, I'm curious what people other than Stephen (and
other pgbackrest hackers) think about the relative value of parallel
backup vs. incremental backup. Stephen appears quite convinced that
parallel backup is full of win and incremental backup is a bit of a
yawn by comparison, and while I certainly would not want to discount
the value of his experience in this area, it sometimes happens on this
mailing list that [ drum roll please ] not everybody agrees about
everything. So, what do other people think?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2019-04-21 23:50:14 Re: jsonpath
Previous Message Tom Lane 2019-04-21 22:39:47 Re: jsonpath