Re: Point in Time Recovery

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Point in Time Recovery
Date: 2004-07-16 23:57:39
Message-ID: 200407162357.i6GNvdZ09973@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


OK, I think I have some solid ideas and reasons for them.

First, I think we need server-side functions to call when we start/stop
the backup. The advantage of these server-side functions is that they
will do the required work of recording the pg_control values and
creating needed files with little chance for user error. It also allows
us to change the internal operations in later releases without requiring
admins to change their procedures. We are even able to adjust the
internal operation in minor releases without forcing a new procedure on
users.

Second, I think once we start a restore, we should rename recovery.conf
to recovery.in_progress, and when complete rename that to
recovery.done. If the postmaster starts and sees recovery.in_progress,
it will fail to start knowing its recovery was interrupted. This allows
the admin to take appropriate action. (I am not sure what that action
would be. Does he bring back the backup files or just keep going?)

Third, I think we need to put a file in the archive location once we
complete a backup, recording the start/stop xid and wal/offsets. This
gives the admin documentation on what archive logs to keep and what xids
are available for recovery. Ideally the recover program would read that
file and check the recover xid to make sure it is after the stop xid
recorded in the file.

How would the recover program know the name of that file? We need to
create it in /data with start contents before the backup, then complete
it with end contents and archive it.

What should we name it? Ideally it would be named by the WAL
name/offset of the start so it orders in the proper spot in the archive
file listing, e.g.:

000000000000093a
000000000000093b
000000000000093b.032b9.start
000000000000093c

Are people going to know they need 000000000000093b for
000000000000093b.032b9.start? I hope so. Another idea is to do:

000000000000093a.xlog
000000000000093b.032b9.start
000000000000093b.xlog
000000000000093c.xlog

This would order properly. It might be a very good idea to add
extensions to these log files now that we are archiving them in strange
places. In fact, maybe we should use *.pg_xlog to document the
directory they came from.

---------------------------------------------------------------------------

Simon Riggs wrote:
> On Fri, 2004-07-16 at 16:47, Tom Lane wrote:
> > As far as the business about copying pg_control first goes: there is
> > another way to think about it, which is to copy pg_control to another
> > place that will be included in your backup. For example the standard
> > backup procedure could be
> >
> > 1. [somewhat optional] Issue CHECKPOINT and wait till it finishes.
> >
> > 2. cp $PGDATA/global/pg_control $PGDATA/pg_control.dump
> >
> > 3. tar cf /dev/mt $PGDATA
> >
> > 4. do something to record ending WAL position
> >
> > If we standardized on this way, then the tar archive would automatically
> > contain the pre-backup checkpoint position in ./pg_control.dump, and
> > there is no need for any special assumptions about the order in which
> > tar processes things.
> >
>
> Sounds good. That would be familiar to Oracle DBAs doing BACKUP
> CONTROLFILE. We can document that and off it as a suggested procedure.
>
> > However, once you decide to do things like that, there is no reason why
> > the copied file has to be an exact image of pg_control. I claim it
> > would be more useful if the copied file were plain text so that you
> > could just "cat" it to find out the starting WAL position; that would
> > let you determine without any special tools what range of WAL archive
> > files you are going to need to bring back from your archives.
>
> I wouldn't be in favour of a manual mechanism. If you want an automated
> mechanism, whats wrong with using the one thats already there? You can
> use pg_controldata to read the controlfile, again whats wrong with that?
>
> We agreed some time back that an off-line xlog file inspector would be
> required to allow us to inspect the logs and make a decision about where
> to end recovery. You'd still need that.
>
> It's scary enough having to specify the end point, let alone having to
> specify the starting point as well.
>
> At your request, and with Bruce's idea, I designed and built the
> recovery system so that you don't need to know what range of xlogs to
> bring back. You just run it, it brings back the right files from archive
> and does recovery with them, then cleans up - and it works without
> running out of disk space on long recoveries.
>
> I've built it now and it works...
>
> > This is pretty much the same chain of reasoning that Bruce and I went
> > through yesterday to come up with the idea of putting a label file
> > inside the tar backups. We concluded that it'd be worth putting
> > both the backup starting time and the checkpoint WAL position into
> > the label file --- the starting time isn't needed for restore but
> > might be really helpful as documentation, if you needed to verify
> > which dump file was which.
>
> ...if you are doing tar backups...what will you do if you're not using
> that mechanism?
>
> If you are: It's common practice to make up a backup filename from
> elements such as systemname, databasename, date and time etc. That gives
> you the start time, the file last mod date gives you the end time.
>
> I think its perfectly fine for everybody to do backups any way they
> please. There are many licenced variants of PostgreSQL and it might be
> appropriate in those to specify particular ways of doing things.
>
> I'll be trusting the management of backup metadata and storage media to
> a solution designed for the purpose (open or closed source), just as
> I'll be trusting my data to a database solution designed for that
> purpose. That for me is one of the good things about PostgreSQL - we use
> the filesystem, we don't write our own, we provide language interfaces
> not invent our own proprietary server language etc..
>
> Best Regards, Simon Riggs
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2004-07-17 00:08:01 Re: Point in Time Recovery
Previous Message Andreas Pflug 2004-07-16 22:59:47 Re: serverlog rotation/functions