Re: Point in Time Recovery

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Point in Time Recovery
Date: 2004-07-05 23:11:45
Message-ID: 1089069104.17493.132.camel@stromboli
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers pgsql-patches

On Mon, 2004-07-05 at 22:46, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > Should we use a different datatype than time_t for the commit timestamp,
> > one that offers more fine grained differentiation between checkpoints?
>
> Pretty much everybody supports gettimeofday() (time_t and separate
> integer microseconds); you might as well use that. Note that the actual
> resolution is not necessarily microseconds, and it'd still not be
> certain that successive commits have distinct timestamps --- so maybe
> this refinement would be pointless. You'll still have to design a user
> interface that allows selection without the assumption of distinct
> timestamps.

Well, I agree, though without the desired-for UI now, I think some finer
grained mechanism would be good. This means extending the xlog commit
record by a couple of bytes...OK, lets live a little.

> > - when we stop, keep reading records until EOF, just don't apply them.
> > When we write a checkpoint at end of recovery, the unapplied
> > transactions are buried alive, never to return.
> > - stop where we stop, then force zeros to EOF, so that no possible
> > record remains of previous transactions.
>
> Go with plan B; it's best not to destroy data (what if you chose the
> wrong restart point the first time)?
>

eh? Which way round? The second plan was the one where I would destroy
data by overwriting it, thats exactly why I preferred the first.

Actually, the files are always copied from archive, so re-recovery is
always an available option in the design thats been implemented.

No matter...

> Actually this now reminds me of a discussion I had with Patrick
> Macdonald some time ago. The DB2 practice in this connection is that
> you *never* overwrite existing logfile data when recovering. Instead
> you start a brand new xlog segment file,

Now thats a much better plan...I suppose I just have to rack up the
recovery pointer to the first record on the first page of a new xlog
file, similar to first plan, but just fast-forwarding rather than
forwarding.

My only issue was to do with the secondary Checkpoint marker, which is
always reset to the place you just restored FROM, when you complete a
recovery. That could lead to a situation where you recover, then before
next checkpoint, fail and lose last checkpoint marker, then crash
recover from previous checkpoint (again), but this time replay the
records you were careful to avoid.

> which is given a new "branch
> number" so it can be distinguished from the future-time xlog segments
> that you chose not to apply. I don't recall what the DB2 terminology
> was exactly --- not "branch number" I don't think --- but anyway the
> idea is that when you restart the database after an incomplete recovery,
> you are now in a sort of parallel universe that has its own history
> after the branch point (PITR stop point). You need to be able to
> distinguish archived log segments of this parallel universe from those
> of previous and subsequent incarnations.

Thats a good idea, if only because you so easily screw your test data
during multiple recovery situations. But if its good during testing, it
must be good in production too...since you may well perform
recovery...run for a while, then discover that you got it wrong first
time, then need to re-recover again. I already added that to my list of
gotchas and that would solve it.

I was going to say hats off to the Blue-hued ones, when I remembered
this little gem from last year
http://www.danskebank.com/link/ITreport20030403uk/$file/ITreport20030403uk.pdf

> I'm not sure whether Vadim
> intended our StartUpID to serve this purpose, but it could perhaps be
> used that way, if we reflected it in the WAL file names.
>

Well, I'm not sure about StartUpId....but certainly the high 2 bytes of
LogId looks pretty certain never to be anything but zeros. You have 2.4
x 10^14...which is 9,000 years at 1000 log file/sec
We could use the scheme you descibe:
add xFFFF to the logid every time you complete an archive recovery...so
the log files look like 0001000000000CE3 after youve recovered a load of
files that look like 0000000000000CE3

If you used StartUpID directly, you might just run out....but its very
unlikely you would ever perform 65000 recovery situations - unless
you've run the <expletive> code as often as I have :(.

Doing that also means we don't have to work out how to do that with
StartUpID. Of course, altering the length and makeup of the xlog files
is possible too, but that will cause other stuff to stop working....

[We'll have to give this a no-SciFi name, unless we want to make
in-roads into the Dr.Who fanbase :) Don't get them started. Better
still, dont give it a name at all.]

I'll sleep on that lot.

Best regards, Simon Riggs

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Bruce Momjian 2004-07-06 02:27:13 Re: How to list what queries are running in postgres?
Previous Message Tom Lane 2004-07-05 21:46:56 Re: Point in Time Recovery

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2004-07-05 23:15:39 Re: [HACKERS] bug in GUC
Previous Message Andrew Dunstan 2004-07-05 23:04:12 Re: Security...

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2004-07-05 23:15:39 Re: [HACKERS] bug in GUC
Previous Message Tom Lane 2004-07-05 21:46:56 Re: Point in Time Recovery