Re: Updated backup APIs for non-exclusive backups

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: David Steele <david(at)pgmasters(dot)net>
Cc: Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Updated backup APIs for non-exclusive backups
Date: 2016-03-30 08:18:00
Message-ID: CABUevEy38oXGHaSfa=SgZGjpDVDnLugPcw+X7SshwdDi1FX7Jw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 30, 2016 at 4:10 AM, David Steele <david(at)pgmasters(dot)net> wrote:

> On 3/29/16 2:09 PM, Magnus Hagander wrote:
>
> > I had a chat with Heikki, and here's another suggestion:
> >
> > 1. We don't touch the current exclusive backups at all, as previously
> > discussed, other than deprecating their use. For backwards compat.
> >
> > 2. For new backups, we return the contents of pg_control as a bytea from
> > pg_stop_backup(). We tell backup programs they are supposed to write
> > this out as pg_control.backup, *not* as pg_control.
> >
> > 3a. On recovery, if it's an exclusive backup, we do as we did before.
> >
> > 3b. on recovery, in non-exclusive backups (determined from
> > backup_label), we check that pg_control.backup exists *and* that
> > pg_control does *not* exist. That guards us reasonably against backup
> > programs that do the wrong thing, and we know we get the correct version
> > of pg_control.
> >
> > 4. (we can still add the stop location to the backup_label file in case
> > backup programs find it useful, but we don't use it in recovery)
> >
> > Thoughts about this approach?
>
> This certainly looks like it would work but it raises the barrier for
> implementing backups by quite a lot. It's fine for backrest or barman
> but it won't be pleasant for anyone who has home-grown scripts.
>
>
How much does it really raise the bar, though?

It would go from "copy all files and make damn sure you copy pg_control
last, and rename it to pg_control.backup" to "take this binary blob you got
from the server and write it to pg_control.backup"?

Also, the target of these APIs is specifically the backup tools and not
homewritten scripts. A simple shellscript will have trouble enough using it
in the first place since it requires a persistent connection to the
database. But those scripts are likely broken anyway.

You can of course keep the current requirements which is just "copy
pg_control last", but if we do that we have zero way of checking that that
happened, and you may end up with subtly broken restores if the backup
software gets it wrong. (Of course it can get the rename/writeout thing
wrong as well, but that's going to be a lot more obvious if you're doing it
wrong).

The main reason for Heikki to suggest this one over the other basic one is
that it brings protection against the "backup script/program crashed
halfway through but the user still tried to restore from that".They will
outright fail becuase there is no pg_control.backup in that case. If we
don't care about that, then we can go back to just saying "copy pg_control
last and we're done". But you yourself complained about that requirement
because it's too easy to get wrong (though you advocated using backup_label
to transfer the data over -- but that has the potential for getting more
complicated if we now or at any point in the future want more than one
field to transfer, for example).

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2016-03-30 08:18:58 Re: Updated backup APIs for non-exclusive backups
Previous Message Artur Zakirov 2016-03-30 08:17:36 Re: unexpected result from to_tsvector