Skip site navigation (1) Skip section navigation (2)

Re: PITR problem

From: Erik Jones <erik(at)myemma(dot)com>
To: wstrzalka <wstrzalka(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: PITR problem
Date: 2008-04-28 17:00:52
Message-ID: 5309E09C-113D-4CEB-9776-D6F01D40C4DE@myemma.com (view raw or flat)
Thread:
Lists: pgsql-general
On Apr 26, 2008, at 5:11 PM, wstrzalka wrote:

> I have some problem with setting up PITR recovery on the database.
>
> I have archive_command set properly and logs are shipping OK. Archive
> timeout is also set (5 min).
>
> When performing pg_start_backup the WAL is lets say on position
> 0000000100000001000000D9, then I start copy database to the second
> machine which takes me 30 minutes. In that time archive timeout is
> called a few times and those file are shipped properly to the second
> host. After DB is succesfully copied i'm calling pg_stop_backup. The
> WAL is at the moment on position 0000000100000001000000DE.
>
> In that moment I see on the second machine WAL files from
> 0000000100000001000000D9 to 0000000100000001000000DE as well as
> 0000000100000001000000D9.00000020.backup
>
> The problem occurs now when I'm trying to start my standby server in
> recovery mode (with pg_standby).
>
> The output from pg_standby:
> ------------------------------------
> Trigger file             : /tmp/pgsql.promote_trigger.5432
> Waiting for WAL file     : 00000001.history
> WAL file path            : /var/lib/pgsql/incoming_wal/
> 00000001.history
> Restoring to...          : pg_xlog/RECOVERYHISTORY
> Sleep interval           : 5 seconds
> Max wait interval        : 0 forever
> Command for restore      : ln -s -f "/var/lib/pgsql/incoming_wal/
> 00000001.history" "pg_xlog/RECOVERYHISTORY"
> Keep archive history     : 0000000100000001000000DB and later
> running restore          : OK
>
>
> Trigger file             : /tmp/pgsql.promote_trigger.5432
> Waiting for WAL file     : 0000000100000001000000D9.00000020.backup
> WAL file path            : /var/lib/pgsql/incoming_wal/
> 0000000100000001000000D9.00000020.backup
> Restoring to...          : pg_xlog/RECOVERYHISTORY
> Sleep interval           : 5 seconds
> Max wait interval        : 0 forever
> Command for restore      : ln -s -f "/var/lib/pgsql/incoming_wal/
> 0000000100000001000000D9.00000020.backup" "pg_xlog/RECOVERYHISTORY"
> Keep archive history     : 0000000100000001000000DB and later
> running restore          : OK
>
>
> Trigger file             : /tmp/pgsql.promote_trigger.5432
> Waiting for WAL file     : 0000000100000001000000D9
> WAL file path            : /var/lib/pgsql/incoming_wal/
> 0000000100000001000000D9
> Restoring to...          : pg_xlog/RECOVERYXLOG
> Sleep interval           : 5 seconds
> Max wait interval        : 0 forever
> Command for restore      : ln -s -f "/var/lib/pgsql/incoming_wal/
> 0000000100000001000000D9" "pg_xlog/RECOVERYXLOG"
> Keep archive history     : 0000000100000001000000DB and later
> running restore          : OK
> removing "/var/lib/pgsql/incoming_wal/0000000100000001000000D9"
> removing "/var/lib/pgsql/incoming_wal/0000000100000001000000DA"
>
> --------------------------------------------------------------------------------------------------------
>
>
> For the first time I start standby Postgres log says and the postgres
> process goes down:
> --------------------------------------------------------------------------------------------------------
> restored log file "0000000100000001000000D9.00000020.backup" from
> archive
> could not open file "pg_xlog/0000000100000001000000D9" (log file 1,
> segment 217): No such file or directory
> invalid checkpoint record
> could not locate required checkpoint record
> If you are not restoring from a backup, try removing the file "/var/
> lib/pgsql/data/backup_label".
> startup process (PID 19201) was terminated by signal 6: Aborted
> aborting startup due to startup process failure
> --------------------------------------------------------------------------------------------------------
>
> When I try to start PG for the second time it just stucks waiting
> for ...000D9
>
> In my opinion the problem is that when starting standby PostgresSQL
> wants to recovery WAL 0000000100000001000000D9, but first deletes it,
> as keep  archive history (%r) param is set to
> 0000000100000001000000DB
>
> Is it a bug or I'm missing something? I can repeat the scenario with
> this big DB. However it's not happening on exactly the same
> environment when playing with smaller cluster (copying cluster is
> shorter then archive_timeout ).

What is the full pg_standby command string (restore_command=....) in  
your recovery.conf.  It sound's like you have pg_standby set to delete  
archived WALs and possibly have that a little too aggressive.  Do you  
have the -k flag set in your pg_standby call in your restore_command?

Erik Jones

DBA | Emma®
erik(at)myemma(dot)com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com




In response to

Responses

pgsql-general by date

Next:From: AndrusDate: 2008-04-28 17:05:45
Subject: Sorting nulls and empty strings together
Previous:From: seijinDate: 2008-04-28 16:39:24
Subject: String Comparison and NULL

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group