Re: [HACKERS] Point in Time Recovery

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: markw(at)osdl(dot)org, pgman(at)candle(dot)pha(dot)pa(dot)us, kn(at)mgnet(dot)de, pgsql-patches(at)postgresql(dot)org
Subject: Re: [HACKERS] Point in Time Recovery
Date: 2004-07-19 07:35:05
Message-ID: 1090222505.17493.22360.camel@stromboli
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers pgsql-patches

On Mon, 2004-07-19 at 04:03, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > Latest version, pitr_v5_2.patch...
>
> Reviewed and committed with some adjustments.
>

Wow! Thanks very much - you work fast.

I'll be re-testing later today.

> I see the following significant loose ends:
>
> * Documentation is, um, lacking. (One point in particular is that I
> inserted the recovery.conf.sample file into CVS, but did not fill in
> the patch's lack of attempt to install it anywhere.)
>

Yes...wasn't sure what to do with that. Is everybody happy to install it
as a sample into the main Data Directory? (i.e. as recovery.conf.sample
rather than recovery.conf which would be a bad thing).

> * As Bruce has pointed out already, the process of making a backup
> needs some improvements for more safety: the starting and ending WAL
> offsets have got to be recorded somehow.
>

Haven't got to that yet, but will do.

> * As I have pointed out already, we need to invent "timelines" to
> allow incompatible WAL segments to exist side-by-side. I will volunteer
> to look into this.

Yes, discussing on the other thread.

>
> * I think creating a .ready file during XLogFileOpen is completely bogus,
> for reasons mentioned in committed comments (look for XXX). Possibly
> this can go away with timelines.

Yes, to some extent it would go away with timelines.

If you have a local copy at the end of a timeline that isn't archived,
then it seems a good idea to archive it, or at least copy it somewhere
safe. If you don't then you will not be able to revert to a full
recovery of that timeline in the future should you choose to do so.

The code and its location may be somewhat more suspect.... :)

>
> * I am wondering if it wouldn't be a good idea to remove the local copy
> of any segment we successfully obtain from archive. The existing
> comments note that we might get a wrong or corrupted file from archive,
> but aren't we in at least as much risk of using an obsolete segment
> restored from backup if we leave the local segment in place? (The
> archive recovery run itself will know not to do this, but if we crash
> shortly thereafter, the ensuing recovery run would NOT know not to
> trust such files.)
>

I agree they're a loose end that needs some thought.

I avoided that decision by going around the files. We originally agreed
that we would keep that data....reason was you can't tell whether the
files have been restored by a backup that forgot to exclude pg_xlog, or
that we are choosing to do a PITR recovery on an otherwise healthy
system (or as the comments explain maybe we lost everything except
pg_xlog).

If we crash during recovery it doesn't crash recover and restart.

If we crash after recovery, then the checkpoint record will have moved
forward and we so we don't then accidentally re-use those local copies.

Timelines will solve this...
>
> Perhaps the last point is really a backup-process issue. AFAICS there
> is no good reason for a backup tarfile to include $PGDATA/pg_xlog at
> all, and some good reasons for it not to. Can we redesign either the
> backup process or the disk layout so that that will not happen? Then
> we could stop worrying about stale local pg_xlog files.
>

Thats the way I saw it.

Seems fairly easy to say "don't backup pg_xlog", but you can't guarantee
they won't, even if you tell them not to...

What is stale today maybe considered to be actually your best option
when testing to see whether a recovery has achieved your objectives.

I'll read the who patch, your comments and test before I respond
further. Thanks for working so hard on this, so quickly.

Best Regards, Simon Riggs

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Michael Stephenson 2004-07-19 10:25:49 Re: [ADMIN] Migrate postgres databases from SQL_ASCII to UNICODE
Previous Message Simon Riggs 2004-07-19 06:39:03 Re: [HACKERS] Point in Time Recovery

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2004-07-19 07:43:41 Re: Why we really need timelines *now* in PITR
Previous Message Simon Riggs 2004-07-19 06:39:03 Re: [HACKERS] Point in Time Recovery

Browse pgsql-patches by date

  From Date Subject
Next Message Andreas Pflug 2004-07-19 10:54:17 Re: logfile subprocess and Fancy File Functions
Previous Message Simon Riggs 2004-07-19 06:39:03 Re: [HACKERS] Point in Time Recovery