Re: "Resurrected" data files - problem?

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, Peter Childs <peterachilds(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: "Resurrected" data files - problem?
Date: 2007-11-09 22:17:11
Message-ID: 1194646631.4251.516.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, 2007-11-09 at 10:59 -0500, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > On Fri, 2007-11-09 at 10:28 +0100, Albe Laurenz wrote:
> >> I think that understanding is finally dawning here.
> >>
> >> The problem you see is that the backup software might decide
> >> that the file has not been changed, skip it and go on backing
> >> up other files, but the file can still be modified before
> >> pg_stop_backup(), correct?
>
> > Correct.
>
> Surely that's nonsense --- otherwise a time-extended base backup
> could not work either.
>
> What is required of the filesystem backup process is that each 8K page
> of each file be restored to a state that it had at some time between
> pg_start_backup and pg_stop_backup. The exact time can be different for
> different pages. I don't see a reason to think that a base+incremental
> backup method can't meet that requirement.

Hmm, OK, I do think we can improve on what I said before. What I said
was safe though, not nonsense, plus I think the difference between the
two is a hair's breadth in practice. Most of the time we're taking about
excluding historical data partitions, designed specifically to minimise
backup windows and data maintenance.

The question is which timestamps do you compare in order to arrive at
the maximal set of files that don't need to be backed up twice?

I'm assuming we're talking about starting recovery at the second
pg_start_backup(). Any file that changes during the first backup cannot
be a candidate, so timestamp before first backup must equal timestamp
before second backup. So we can effectively reduce things to just 2
timestamps, not 4 as I had originally said.

So we have:

pg_start_backup()
timestamp1
full backup
pg_stop_backup()

...

pg_start_backup() --- WAL chain starts here
timestamp2
incremental backup
pg_stop_backup()

Any file for which ts1 == ts2 can be skipped during the incremental
backup.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Coarr, Matt 2007-11-09 22:36:14 any way to query for current connections to db?
Previous Message Todd A. Cook 2007-11-09 22:06:33 Is "query" a reserved word in 8.3 plpgsql?