Re: Proposal: Incremental Backup

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Gabriele Bartolini <gabriele(dot)bartolini(at)2ndquadrant(dot)it>, desmodemone <desmodemone(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Incremental Backup
Date: 2014-08-06 15:20:50
Message-ID: 20140806152050.GJ13302@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 6, 2014 at 06:48:55AM +0100, Simon Riggs wrote:
> On 6 August 2014 03:16, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > On Wed, Aug 6, 2014 at 09:17:35AM +0900, Michael Paquier wrote:
> >> On Wed, Aug 6, 2014 at 9:04 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> >> >
> >> > On 5 August 2014 22:38, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> >> > Thinking some more, there seems like this whole store-multiple-LSNs
> >> > thing is too much. We can still do block-level incrementals just by
> >> > using a single LSN as the reference point. We'd still need a complex
> >> > file format and a complex file reconstruction program, so I think that
> >> > is still "next release". We can call that INCREMENTAL BLOCK LEVEL.
> >>
> >> Yes, that's the approach taken by pg_rman for its block-level
> >> incremental backup. Btw, I don't think that the CPU cost to scan all
> >> the relation files added to the one to rebuild the backups is worth
> >> doing it on large instances. File-level backup would cover most of the
> >
> > Well, if you scan the WAL files from the previous backup, that will tell
> > you what pages that need incremental backup.
>
> That would require you to store that WAL, which is something we hope
> to avoid. Plus if you did store it, you'd need to retrieve it from
> long term storage, which is what we hope to avoid.

Well, for file-level backups we have:

1) use file modtime (possibly inaccurate)
2) use file modtime and checksums (heavy read load)

For block-level backups we have:

3) accumulate block numbers as WAL is written
4) read previous WAL at incremental backup time
5) read data page LSNs (high read load)

The question is which of these do we want to implement? #1 is very easy
to implement, but incremental _file_ backups are larger than block-level
backups. If we have #5, would we ever want #2? If we have #3, would we
ever want #4 or #5?

> > I am thinking we need a wiki page to outline all these options.
>
> There is a Wiki page.

I would like to see that wiki page have a more open approach to
implementations.

I do think this is a very important topic for us.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2014-08-06 15:22:15 Re: Append to a GUC parameter ?
Previous Message Robert Haas 2014-08-06 15:15:01 Re: Scaling shared buffer eviction