Re: backup manifests

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, David Steele <david(at)pgmasters(dot)net>, Tels <nospam-pg-abuse(at)bloodgate(dot)com>, Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: backup manifests
Date: 2020-01-08 01:33:48
Message-ID: 20200108013348.GF3195@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> On Fri, Jan 3, 2020 at 2:35 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > > Well, I don't know how to make you happy here.
> >
> > I suppose I should admit that, first off, I don't feel you're required
> > to make me happy, and I don't think it's necessary to make me happy to
> > get this feature into PG.
>
> Fair enough. That is gracious of you, but I would like to try to make
> you happy if it is possible to do so.

I certainly appreciate that, but I don't know that it is possible to do
so while approaching this in the order that you are, which I tried to
point out previously.

> > Since you expressed that interest though, I'll go out on a limb and say
> > that what would make me *really* happy would be to think about where the
> > project should be taking pg_basebackup, what we should be working on
> > *today* to address the concerns we hear about from our users, and to
> > consider the best way to implement solutions to what they're actively
> > asking for a core backup solution to be providing. I get that maybe
> > that isn't how the world works and that sometimes we have people who
> > write our paychecks wanting us to work on something else, and yes, I'm
> > sure there are some users who are asking for this specific thing but I
> > certainly don't think it's a common ask of pg_basebackup or what users
> > feel is missing from the backup options we offer in core; we had users
> > on this list specifically saying they *wouldn't* use this feature
> > (referring to the differential backup stuff, of course), in fact,
> > because of the things which are missing, which is pretty darn rare.
>
> Well, I mean, what you seem to be suggesting here is that somebody is
> driving me with a stick to do something that I don't really like but
> have to do because otherwise I won't be able to make rent, but that's
> actually not the case. I genuinely believe that this is a good design,
> and it's driven by me, not some shadowy conglomerate of EnterpriseDB
> executives who are out to make PostgreSQL sucks. If I'm wrong and the
> design sucks, that's again not the fault of shadowy EnterpriseDB
> executives; it's my fault. Incidentally, my boss is not very shadowy
> anyhow; he's a super-nice guy, and a major reason why I work here. :-)

Then I just have to disagree, really vehemently, that having a
block-level incremental backup solution without solid dependency
handling between incremental and full backups, solid WAL management and
archiving, expiration handling for incremental/full backups and WAL, and
the manifest that that this thread has been about, is a good design.

Ultimately, what this calls for is some kind of 'repository' which
you've stressed you don't think is a good idea for pg_basebackup to ever
deal with and I just can't disagree more with that. I could perhaps
agree that it isn't appropriate for the specific tool "pg_basebackup" to
work with a repo because of the goal of that particular tool, but in
that case, I don't think pg_basebackup should be the tool to provide a
block-level incremental backup solution, it should continue to be a tool
to provide a simple and easy way to take a one-time, complete, snapshot
of a running PG system over the replication protocol- and adding support
for parallel backups, or encrypted backups, or similar things would be
completely in-line and appropriate for such a tool, and I'm not against
those features being added to pg_basebackup even in advance of anything
like support for a repo or dependency handling.

> I don't think the issue here is that I haven't thought about what
> users want, but that not everybody wants the same thing, and it's
> seems like the people with whom I interact want somewhat different
> things than those with whom you interact. EnterpriseDB has an existing
> tool that does parallel and block-level incremental backup, and I
> started out with the goal of providing those same capabilities in
> core. They are quite popular with EnterpriseDB customers, and I'd like
> to make them more widely available and, as far as I can, improve on
> them. From our previous discussion and from a (brief) look at
> pgbackrest, I gather that the interests of your customers are somewhat
> different. Apparently, block-level incremental backup isn't quite as
> important to your customers, perhaps because you've already got
> file-level incremental backup, but various other things like
> encryption and backup verification are extremely important, and you've
> got a set of ideas about what would be valuable in the future which
> I'm sure is based on real input from your customers. I hope you pursue
> those ideas, and I hope you do it in core rather than in a separate
> piece of software, but that's up to you. Meanwhile, I think that if I
> have somewhat different ideas about what I'd like to pursue, that
> ought to be just fine. And I don't think it is unreasonable to hope
> that you'll acknowledge my goals as legitimate even if you have
> different ones.

I'm all for block-level incremental backup, in general (though I've got
concerns about it from a correctness standpoint.. I certainly think
it's going to be difficult to get right and probably finicky, but
hopefully your experience with BART has let you identify where the
dragons lie and it'll be interesting to see what that code looks like
and if the approach used can be leveraged in other tools), but I am
concerned about how we're getting there.

> I want to point out that my idea about how to do all of this has
> shifted by a considerable amount based on the input that you and David
> have provided. My original design didn't involve a backup manifest,
> but now it does. That turned out to be necessary, but it was also
> something you suggested, and something where I asked and took advice
> on what ought to go into it. Likewise, you suggested that the process
> of taking the backup should involve giving the client more control
> rather than trying to do everything on the server side, and that is
> now the design which I plan to pursue. You suggested that because it
> would be more advantageous for out-of-core backup tools, such as
> pgbackrest, and I acknowledge that as a benefit and I think we're
> headed in that direction. I am not doing a single thing which, to my
> knowledge, blocks anything that you might want to do with
> pg_basebackup in the future. I have accepted as much of your input as
> I believe that I can without killing the project off completely. To go
> further, I'd have to either accept years of delay or abandon my
> priorities entirely and pursue yours.

While I'm hopeful that the parallel backup pieces will be useful to
out-of-core backup tools, I've been increasingly less confident that
it'll end up being very useful to pgbackrest, as much as I would like it
to be. Perhaps after it's in place we might be able to work on it to
make it useful, but we'd need to push all the features like encryption
and options for compression and such into the backend, in a way that
works for pgbackrest, to be able to leverage it, and I'm not sure that
would get much support or that it could be done in a way that doesn't
end up causing problems for pg_basebackup, which clearly wouldn't be
acceptable. Further, if we can't leverage the PG backup protocol that
you're building here, it seems pretty darn unlikely we'd have much use
for the manifest that's built as part of that.

I'm probably going to lose what credibility I have in critizing what
you're doing with pg_basebackup here, but I started off saying you don't
have to make me happy and this is part of why- I really don't think
there's much that you're doing with pg_basebackup that is ultimately
going to impact what plans I have for the future, for pretty much
anything. I haven't got any real specific plans around pg_basebackup,
though, point-in-fact, if you put in a bunch of code that shows how to
get PG and pg_basebackup to do block-level incremental backups in a safe
and trusted way, that would actually be *really* useful to the
pgbackrest project because we could then lift that logic out of
pg_basebackup and leverage it. If I wanted to be entirely selfish, I'd
be pushing you to get block-level incremental backup into pg_basebackup
as quickly as possible so that we could have such an example of "how to
do it in a way that, if it breaks, the PG community will figure out what
went wrong and fix it". If you look at other things we've done, such as
not backing up unlogged tables, that's exactly the approach we've used:
introduce the feature into pg_basebackup *first*, make sure the
community agrees that it's a valid approach and will deal with any
issues with it (and will take pains to avoid *breaking* it in future
versions..), and only *then* introduce it into pgbackrest by using the
same approach. Those other features were well in-line with what makes
sense for pg_basebackup too though.

We haven't done that though, and I haven't been pushing in that
direction, not because I think it's a bad feature or that I want to
block something going into pg_basebackup or whatever, but because I
think it's actually going to cause more problems for users than it
solves because some users will want to use it (though not all, as we've
seen on this list, as there's at least some users out there who are as
scared of the idea of having *just* this in pg_basebackup without the
other things I talk about above as I am) and then they're going to try
and hack together all those other things they need around WAL management
and archiving and expiration and they're likely to get it wrong- perhaps
in obvious ways, perhaps in relatively subtle ways, but either way,
they'll end up with backups that aren't valid that they only discover
when they're in an emergency. Again, perhaps selfish me would say "oh
good, then they'll call me and pay me lots to fix it for them", but it
certainly wouldn't look good for the community- even if all of the
documentation and everything we put out there says that they way they
were doing it had this subtle issue or whatever (considering our docs
still promote a really bad, imv anyway, archive command kinda makes this
likely, if you ask me anyway..), and it wouldn't be good for the user.

> > That's what would make *me* happy. Even some comments about how to
> > *get* there while also working towards these features would be likely
> > to make me happy. Instead, I feel like we're being told that we need
> > this feature badly in v13 and we're going to cut bait and do whatever
> > is necessary to get us there.
>
> This seems like a really unfair accusation given how much work I've
> put into trying to satisfy you and David. If this patch, the parallel
> full backup patch, and the incremental backup patch were all to get
> committed to v13, an outcome which seems pretty unlikely to me at this
> point, then you would have a very significant number of things that
> you have requested in the course of the various discussions, and
> AFAICS the only thing you'd have that you don't want is the need to
> parse the manifest file use while (<>) { @a = split /\t/, $_ } rather
> than $a = parse_json(join '', <>). You would, for example, have the
> ability to request an individual file from the server rather than a
> complete tarball. Maybe the command that requests a file would lack an
> encryption option, something which IIUC you would like to have, but
> that certainly does not leave you worse off. It is easier to add an
> encryption option to a command which you already have than it is to
> invent a whole new command -- or really several whole new commands,
> since such a command is not really usable unless you also have
> facilities to start and stop a backup through the replication
> protocol.

No, the manifest format is definitely not the only issue that I have
with this- but as it relates to the thread about building a manifest, my
complaint really is isolated to the format and just forward thinking
about how the format you're advocating for will mean custom code for who
knows how many different tools. While I appreciate the offer to write
all the bespoke code for every version of the manifest for pgbackrest,
I'm really not thrilled about the idea of having to have that extra code
and having to then maintain it. Yes, when you compare the single format
of the manifest and the code required for it against a JSON parser, if
we only ever have this one format then it'd win in terms of code, but I
don't believe it'll end up being one format, instead we're going to end
up with multiple formats, each of which will have some additional code
for dealing with parsing it, and that's going to add up. That's also
going to, as I said before, make it almost certain that we can't use
older tools with newer backups. These are issues that we've thought
about and worried about over the years of pgbackrest and with that
experience we've come down on the side that a JSON-based format would be
an altogether better design. That's why we're advocating for it, not
because it requires more code or so that it delays the efforts here, but
because we've been there, we've used other formats, we've dealt with
user complaints when we do break things, this is all history for us
that's helped us learn- for PG, it looks like the future with a static
format, and I get that the future is hard to predict and pg_basebackup
isn't pgbackrest and yeah, I could be completely wrong because I don't
actually have a crystal ball, but this starting point sure looks really
familiar.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Kimura 2020-01-08 01:40:12 Re: Make autovacuum sort tables in descending order of xid_age
Previous Message Kyotaro Horiguchi 2020-01-08 01:13:20 Re: mdclose() does not cope w/ FileClose() failure