Re: Updated backup APIs for non-exclusive backups

From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>, Noah Misch <noah(at)leadboat(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Updated backup APIs for non-exclusive backups
Date: 2018-11-25 20:45:01
Message-ID: f4757edf28bdcec95d54ec0eb7a8b2afe62191c6.camel@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Stephen Frost wrote:
> > > Seeing it often doesn't make it a good solution. Running just
> > > pre-backup and post-backup scripts and copying the filesystem isn't
> > > enough to perform an online PostgreSQL backup- the WAL needs to be
> > > collected as well, and you need to make sure that you have all of the
> > > WAL before the backup can be considered complete.
> >
> > Yes, that's why "pg_stop_backup" has the "wait_for_archive" parameter.
> > So this is not a problem.
>
> That doesn’t actually make sure you have all of the WAL reliably saved
> across the backup, it just cares what archive command returns, which is
> sadly often a bad thing to depend on. I certainly wouldn’t rely on only
> that for any system I cared about.

If you write a bad archive_command, you have a problem.
But that is quite unrelated to the problem at hand, if I am not mistaken.

> > > On restore, you're
> > > going to need to create a recovery.conf (at least in released versions)
> > > which provides a restore command (needed even in HEAD today) to get the
> > > old WAL, so having to also create the backup_label file shouldn't be
> > > that difficult.
> >
> > You write "recovery.conf" upon recovery, when you have the restored
> > backup, so you have it on a file system. No problem adding a file then.
> >
> > This is entirely different from adding a "backup_label" file to
> > a backup that has been taken by a backup software in some arbitrary
> > format in some arbitrary location (think snapshot).
>
> There isn’t any need to write the backup label before you restore the database,
> just as you write recovery.conf then.

Granted.
But it is pretty convenient, and writing it to the data directory right away
is a good thing on top, because it reduces the danger of inadvertedly
starting the backup without recovery.

> > > Lastly, if you really want, you can extract out the data from
> > > pg_stop_backup in whatever your post-backup script is.
> >
> > Come on, now.
> > You usually use backup techniques like that because you can't get
> > your large database backed up in the available time window otherwise.
>
> I’m not following what you’re trying to get at here, why can’t you extract
> the data for the backup label from pg_stop_backup..? Certainly other tools
> do, even ones that do extremely fast parallel backups.. the two are
> completely independent.
>
> Did you think I meant pg_basebackup..? I certaily didn’t.

Oh yes, I misunderstood. Sorry.

Yes, you can come up with a post-backup script that somehow communicates
with your pre-backup script to get the information, but it sure is
inconvenient. Simplicity is good in backup solutions, because complicated
things tend to break more easily.

> > I thought our goal is to provide convenient backup methods...
>
> Correctness would be first and having a broken system because of a crash during a backup isn’t correct.

"Not starting up without manual intervention" is not actually broken...

> > But what's wrong with retaining the exclusive backup method and just
> > sticking a big "Warning: this may cause a restart to fail after a crash"
> > on it? That sure wouldn't be unsafe.
>
> I haven’t seen anyone pushing for it to be removed immediately, but users should
> not use it and newcomers would be much better served by using the non exclusive api.
> There is a reason it was deprecated and it’s because it simply isn’t a good API.
> Coming along a couple years later and saying that it’s a good API while ignoring
> the issues that it has doesn’t change that.

I don't think I'm ignoring the issues, I just think there is a valid use case for
the exclusive backup API, with all its caveats.

Of course I'm not arguing on behalf of organizations running lots of databases
for whom manual intervention after a crash is unacceptable.

I'm arguing on behalf of users that run a few databases, want a simple backup
solution and are ready to deal with the shortcomings.

But I will gladly accept defeat in this matter, I just needed to vent my opinion.

Yours,
Laurenz Albe

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2018-11-25 20:48:35 Re: Continue work on changes to recovery.conf API
Previous Message Andres Freund 2018-11-25 20:39:57 Re: Continue work on changes to recovery.conf API