WAL recovery

From: Andy Shellam <andy(dot)shellam(at)mailnetwork(dot)co(dot)uk>
To: pgsql-admin(at)postgresql(dot)org
Subject: WAL recovery
Date: 2006-02-22 16:26:49
Message-ID: 43FC90C9.3090204@mailnetwork.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi,

I'm trying to get a WAL recovery system set up so I have a hot-spare
database server standing by should my first one fail.
The idea is that every night, over night, the WAL logs for that day will
be shipped from the main server to the standby, and the standby will
replay them so it is up to date.

Every week a full backup will be taken of the live system, and stored
off-site.

So far I've got it working so that:

- My full, base backup from yesterday has been loaded onto the spare
- The WAL logs up to 2PM today have been shipped and replayed onto the
spare - all OK to here

However, whenever I try to ship more logs and play them, I get the
following error in the final file:

2006-02-22 15:50:00 GMT LOG: starting archive recovery
2006-02-22 15:50:00 GMT LOG: restore_command = "cp
/mndata/archive/xlog_archive/%f %p"
cp: cannot stat `/mndata/archive/xlog_archive/00000001.history': No such
file or directory
2006-02-22 15:50:00 GMT LOG: restored log file
"0000000100000000000000D9" from archive
2006-02-22 15:50:00 GMT LOG: invalid record length at 0/D9FFDB84
2006-02-22 15:50:00 GMT LOG: invalid primary checkpoint record
2006-02-22 15:50:00 GMT LOG: restored log file
"0000000100000000000000D9" from archive
2006-02-22 15:50:00 GMT LOG: restored log file
"0000000100000000000000DA" from archive
2006-02-22 15:50:00 GMT LOG: invalid resource manager ID in secondary
checkpoint record
2006-02-22 15:50:00 GMT PANIC: could not locate a valid checkpoint record
2006-02-22 15:50:00 GMT LOG: startup process (PID 20792) was terminated
by signal 6
2006-02-22 15:50:00 GMT LOG: aborting startup due to startup process
failure
2006-02-22 15:50:00 GMT LOG: logger shutting down

However, if I delete my PG data directory, restore the same base backup
from yesterday, and begin recovery, it recovers right up until the last
log file, which the previous roll-forward attempt fails.
The log files were fully archived off the live server to begin with so I
can't see it's that they've changed or anything.

Is this scenario possible - that you can keep rolling forward over log
files as long as necessary, or do you always have to start from a base
backup? Nothing is changing on the spare, it's literally a sitting duck.

Thanks

Andy

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Christian Sengstock 2006-02-22 16:45:31 broken restore.sql script !?
Previous Message Tom Lane 2006-02-22 15:30:37 Re: WARNING: foreign key constraint will require costly sequential scans during pg_restore