Re: Postgres point-in-time recovery failure

From: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Cheryl Grant *EXTERN*" <cheryl(dot)grant(at)aapt(dot)com(dot)au>
Cc: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Postgres point-in-time recovery failure
Date: 2013-02-27 08:58:45
Message-ID: A737B7A37273E048B164557ADEF4A58B057B8F1A@ntex2010a.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Cheryl Grant wrote:
>>> 2844LOG: starting point-in-time recovery to 2013-02-26 12:53:00+11
>>> 2844LOG: restored log file "000000010000017D00000056" from archive
>>> 2844LOG: unexpected pageaddr 17D/2E000000 in log file 381, segment 86,
>>> offset 0
>>> 2844LOG: invalid checkpoint record
>>> 2844FATAL: could not locate required checkpoint record

>> That indicates that the WAL file 000000010000017D00000056 is
>> broken. Are you sure that it is from the PostgreSQL server
>> you backed up? How did you archive the WAL files?

> I'm doing a pg_basebackup to create the instance with -x specified so some of the logs are in the
> pg_xlog directory after the backup. It always seems to fall over with the same error on the first log.
> I've tried this numerous times with different backups and it always fails on the first log.

Ah, but what the above log entry says is that it
took the WAL file from the archive location and
copied it into pg_xlog.

So the WAL file created by the -x switch of
pg_basebackup was overwritten with a file from
the archive.

Does the archive contain a different (= wrong)
copy of the WAL file?

> I've used the same method to create a hot standby which works, but only because streaming replication
> is getting the data across. But this won't work in a disaster recovery situation.

Even with streaming replication that should not
work, if the problem is a bad WAL file in the archive.

> My backup command for the primary WAL logs is a script. Here is the contents of the script:
>
> ls -1 $PGDATA/pg_xlog | while read f; do
> {
> if [ -f $PGDATA/pg_xlog/$f ] ; then
> if [ ! -f $LOGPATH/$f ] ; then
> echo "$PGDATA/pg_xlog/$f" >> $LOGFILE
> cp $PGDATA/pg_xlog/$f $LOGPATH
> status=$?
> echo status=$status >> $LOGFILE
> scp $LOGPATH/$f $SCPHOST:$LOGPATH &
> fi
> fi
> } done;

That's not your archive_command, right?
At what points is this script run?
Could it have copied an incomplete WAL file to the archive?

The good way to archive WAL files is to specify an
appropriate archive_command in postgresql.conf.
Then each WAL file is archived as soon as it is full,
and the PostgreSQL server knows if archiving worked or not.

Yours,
Laurenz Albe

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Charles Sprickman 2013-02-27 10:52:49 Re: logging full queries separately
Previous Message Cheryl Grant 2013-02-26 23:18:30 Re: Postgres point-in-time recovery failure