Re: Documentation on PITR still scarce

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Joachim Wieland <joe(at)mcknight(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Documentation on PITR still scarce
Date: 2004-11-06 11:13:34
Message-ID: 1099739614.6942.174.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 2004-11-06 at 00:54, Joachim Wieland wrote:
> Hi,
>
> On Fri, Nov 05, 2004 at 10:26:55PM +0000, Simon Riggs wrote:
> > That is exactly the situation Timelines are designed to avoid. This
> > should not have happened. What leads you to think it has? My guess is
> > that it has not. If it has, its a bug.
>
> Hmm. I did the following:
>
> - I recovered to one PIT.
> - I verified that everything was fine.
> - If I shut down postmaster now and try to recover to another PIT,
> everything will work fine. (by re-restoring the original backup as you
> pointed out)
>
> However if I:
>
> - Shut down postmaster and restart it in normal mode (without a new
> recovery.conf) and then do some database operations, it seems to
> overwrite a file from my archive:
>

Right. You have not done a correct archive recovery and so, yes, you
will get that failure. The database can't know about your activities -
you do, and you know they are wrong, so you should expect error.

The timeline code only comes into effect when you request an archive
recovery. If you do not, it has no way of knowing it "should have".

This error is possible because of two things:
i) when PostgreSQL starts up, the only things it knows about are in the
files in the data directory... it has no other "memory" likes humans
do...if you put an incorrect set of files there for it, then it will
be...incorrect
ii) PostgreSQL hands-off responsibility for management of the archive to
you. Using a simple copy command is not the best way to protect your
important data archives - its just an example for understanding and
testing.

It doesn't and can't know what you have done, so cannot itself avoid
*requesting* the overwrite. You are the only one that determine that the
*request* to archive would cause an error.

I can see that this exposes a window for user error, and we should
document this. The correct way to get around this potential error is to:
i) follow the instructions
ii) or, for safety, write a script that checks for the existence of the
file in the archive before it does the copy.

so then set archive_command = "copy2myarchive ...."

where copy2myrchive does
- checks for file existence in archive, abort if file exists
- does the copy

Timelines are brilliant, but they don't protect you from everything.

> [...recovery...]
> LOG: archive recovery complete
> LOG: database system is ready
> LOG: archived transaction log file "00000002.history"
>
> Now we are at timeline 2 I guess.
>
> [...normal startup...]
> LOG: checkpoint record is at 0/22701F8
> LOG: redo record is at 0/22701F8; undo record is at 0/0; shutdown TRUE
> LOG: next transaction ID: 2595; next OID: 231915
> LOG: database system is ready
> [...I do some database action...]
> LOG: archived transaction log file "000000010000000000000001"
> LOG: archived transaction log file "000000020000000000000002"
>
>
> If I stop postmaster again, wipe out my data/ dir and re-restore the
> original backup, I can't do any PITRs any more... If I re-install my archive
> as well, it works again.
>
>
> > > My question is: When I've restored up to the time t_0, how can I go on
> > > to restore up to another point in time, later than t_0 but before the
> > > end of my log files.
>
> > You need to re-restore the original backup.
>
> Ah. Ok. I had the impression that the timelines save me from re-restoring
> the original files and that I could start off directly from there. Ok,
> that's why it didn't work out that well ;-)
>

Once you have brought up a database in timeline N+1, you can't use it as
the base to recover to a point in timeline N because the data file
contents cannot be trusted to be identical to the way they were in
timeline N. Re-restoring the backup sounds like a thing that
needs-optimization, but it is required for transactional correctness.
[There is some slight area of improvement, but I don't wish to explain
this because it might lure people into error by mentioning it...the code
currently requires re-restoring]

--
Best Regards, Simon Riggs

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2004-11-06 12:56:05 Re: Release schedule plans
Previous Message Thomas Hallgren 2004-11-06 10:53:13 Re: [PATCHES] CVS should die