Re: Updated backup APIs for non-exclusive backups

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: David Steele <david(at)pgmasters(dot)net>
Cc: Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Updated backup APIs for non-exclusive backups
Date: 2016-03-31 12:19:02
Message-ID: CABUevExxspZG5jQN8ahd=0yy-9TGJBxP_BLQ_EM1JSDqqb1EzA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 31, 2016 at 4:00 AM, David Steele <david(at)pgmasters(dot)net> wrote:

> On 3/30/16 4:18 AM, Magnus Hagander wrote:
> >
> > On Wed, Mar 30, 2016 at 4:10 AM, David Steele <david(at)pgmasters(dot)net
> > <mailto:david(at)pgmasters(dot)net>> wrote:
> >
> > This certainly looks like it would work but it raises the barrier for
> > implementing backups by quite a lot. It's fine for backrest or
> barman
> > but it won't be pleasant for anyone who has home-grown scripts.
> >
> >
> > How much does it really raise the bar, though?
> >
> > It would go from "copy all files and make damn sure you copy pg_control
> > last, and rename it to pg_control.backup" to "take this binary blob you
> > got from the server and write it to pg_control.backup"?
> >
> > Also, the target of these APIs is specifically the backup tools and not
> > homewritten scripts.
>
> Then what would home-grown scripts use, the deprecated API that we know
> has issues?
>

They should use either pg_basebackup/pg_receivexlog, or they should use a
tool like backrest or barman.

> > A simple shellscript will have trouble enough using
> > it in the first place since it requires a persistent connection to the
> > database.
>
> All that's required is to spawn a psql process. I'm no shell expert but
> that's simple enough.
>

Oh, it's certainly doable, but you also need to come back and talk to it at
a later time (to call stop backup), and do something useful with a
multiline output. None of that is particularly easy. Certainly doable, but
it becomes the wrong tool for the job.

> > But those scripts are likely broken anyway.
>
> Yes, probably. Backup and especially archiving correctly are harder
> than most people realize.
>

Exactly. Which is why we should discourage people from doing that, and
instead point them to the tools that do the job right. This is the whole
reason we're making this change in the first place.

> > The main reason for Heikki to suggest this one over the other basic one
> > is that it brings protection against the "backup script/program crashed
> > halfway through but the user still tried to restore from that". They will
> > outright fail because there is no pg_control.backup in that case.
>
> But if we are going to make this complicated I'm not sure it's a big
> deal just to require pg_control to be copied last. pgBackRest already
> does that and Barman probably does, too.
>

It does (I did doublecheck that at some point).

> I don't see "don't copy pg_control" and "copy pg_control last" as all
> that different in terms of complexity.
>
> pgBackRest also *restores* pg_control last which I think is pretty
> important and would not be addressed by this solution.
>

So maybe we should at least start that way. And just document that clearly,
and then use the patch that we have right now?

Except we add so it still returns the stoppoint() as well (which is
returned from the current pg_stop_backup, but not in the new one).

We can always extend the function with more columns later if we need - it's
changing the ones we have that's a big problem there :)

Then we start this way and we can keep improving if necessary. But the
advantage of sticking to this for now is we don't have to touch the
recovery side at all. And we're pretty late in the cycle ATM.

> > If we
> > don't care about that, then we can go back to just saying "copy
> > pg_control last and we're done". But you yourself complained about that
> > requirement because it's too easy to get wrong (though you advocated
> > using backup_label to transfer the data over -- but that has the
> > potential for getting more complicated if we now or at any point in the
> > future want more than one field to transfer, for example).
>
> Perhaps. I'm still not convinced that getting some information from
> backup_label and other information from pg_control is a good solution.
> I would rather write the recovery point into the backup_label and use
> that instead of the value in pg_control. Text files are much easier to
> parse and push around accurately (and test).
>

What about all the *other* functionality that's then in pg_control?

We could do as Heikki at one point suggested in our chat as well which is
basically write all of pg_control out to the backup_label file, and
reconstruct it from that. But that makes that file a lot more complicated...

//Magnus

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2016-03-31 12:19:17 Re: Sequence Access Method WIP
Previous Message Vitaly Burovoy 2016-03-31 11:31:26 Re: [PATH] Jsonb, insert a new value into an array at arbitrary position