Re: block-level incremental backup

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: block-level incremental backup
Date: 2019-04-18 21:17:02
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I wanted to respond to this point specifically as I feel like it'll
really help clear things up when it comes to the point of view I'm
seeing this from.

* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> > Perhaps that's what we're both saying too and just talking past each
> > other, but I feel like the approach here is "make it work just for the
> > simple pg_basebackup case and not worry too much about the other tools,
> > since what we do for pg_basebackup will work for them too" while where
> > I'm coming from is "focus on what the other tools need first, and then
> > make pg_basebackup work with that if there's a sensible way to do so."
> I think perhaps the disconnect is that I just don't see how it can
> fail to work for the external tools if it works for pg_basebackup.

The existing backup protocol that pg_basebackup uses *does* *not* *work*
for the external backup tools. If it worked, they'd use it, but they
don't and that's because you can't do things like a parallel backup,
which we *know* users want because there's a number of tools which
implement that exact capability.

I do *not* want another piece of functionality added in this space which
is limited in the same way because it does *not* help the external
backup tools at all.

> Any given piece of functionality is either available in the
> replication stream, or it's not. I suspect that for both BART and
> pg_backrest, they won't be able to completely give up on having their
> own backup engines solely because core has incremental backup, but I
> don't know what the alternative to adding features to core one at a
> time is.

This idea that it's either "in the replication system" or "not in the
replication system" is really bad, in my view, because it can be "in the
replication system" and at the same time not at all useful to the
existing external backup tools, but users and others will see the
"checkbox" as ticked and assume that it's available in a useful fashion
by the backend and then get upset when they discover the limitations.

The existing base backup/replication protocol that's used by
pg_basebackup is *not* useful to most of the backup tools, that's quite
clear since they *don't* use it. Building on to that an incremental
backup solution that is similairly limited isn't going to make things
easier for the external tools.

If the goal is to make things easier for the external tools by providing
capability in the backend / replication protocol then we need to be
looking at what those tools require and not at what would be minimally
sufficient for pg_basebackup. If we don't care about the external tools
and *just* care about making it work for pg_basebackup, then let's be
clear about that, and accept that it'll have to be, most likely, ripped
out and rewritten when we go to add parallel capabilities, for example,
to pg_basebackup down the road. That's clearly the case for the
existing "base backup" protocol, so I don't see why it'd be different
for an incremental backup system that is similairly designed and

To be clear, I'm all for adding feature to core one at a time, but
there's different ways to implement features and that's really what
we're talking about here- what's the best way to implement this
feature, ideally in a way that it's useful, practically, to both
pg_basebackup and the other external backup utilities.



In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2019-04-18 21:47:56 Re: finding changed blocks using WAL scanning
Previous Message Tom Lane 2019-04-18 21:14:49 Re: Unhappy about API changes in the no-fsm-for-small-rels patch