Skip site navigation (1) Skip section navigation (2)

Problem with PITR Past Particular WAL File

From: Craig McElroy <craig(dot)mcelroy(at)contegix(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: Problem with PITR Past Particular WAL File
Date: 2007-10-24 07:20:15
Message-ID: 299A6DFE-38B3-443F-A505-40151A4B74F4@contegix.com (view raw or flat)
Thread:
Lists: pgsql-admin
Greetings:
   I am running into a problem during a failover recover of a  
particular 8.2.4 database running on SunOS 5.10 box.  For complete  
divulgence of information, I am also using the pg_standby utility  
from the 8.3 contribs to handle the replay of the logs on the standby  
server.

   What I am finding, is that if I only allow it to replay up to a  
particular WAL file (specifically, 00000001000000180000008A), I am  
able to trigger the system to change out of recovery mode and it  
successfully comes online in a bit as expected.  I also tried  
stopping it at each of a few prior WAL files and experienced the same  
results.  Relevant log lines are as follows:

> Oct 23 22:40:44 db01b postgres[15894]: [ID 748848 local0.info]  
> [5699-1] LOG:  restored log file "000000010000001800000088" from  
> archive
> Oct 23 22:40:44 db01b postgres[15894]: [ID 748848 local0.info]  
> [5700-1] LOG:  restored log file "000000010000001800000089" from  
> archive
> Oct 23 22:40:44 db01b postgres[15894]: [ID 748848 local0.info]  
> [5701-1] LOG:  restored log file "00000001000000180000008A" from  
> archive
> Oct 23 22:45:50 db01b postgres[15894]: [ID 748848 local0.info]  
> [5702-1] LOG:  could not open file "pg_xlog/ 
> 00000001000000180000008B" (log file 24, segment 139): No such file  
> or directory
> Oct 23 22:45:50 db01b postgres[15894]: [ID 748848 local0.info]  
> [5703-1] LOG:  redo done at 18/8A0C3BC8
> Oct 23 22:45:50 db01b postgres[15894]: [ID 748848 local0.info]  
> [5704-1] LOG:  restored log file "00000001000000180000008A" from  
> archive
> Oct 23 22:45:50 db01b postgres[15894]: [ID 748848 local0.info]  
> [5705-1] LOG:  archive recovery complete
> Oct 23 22:46:16 db01b postgres[15894]: [ID 748848 local0.info]  
> [5706-1] LOG:  database system is ready

   Now, if I include one more WAL file in the recovery, the  
additional WAL file appears to be successfully restored, but when  
triggering the system to come out of recovery mode it fails to fully  
come online and proceeds to shutdown a few minutes later.  I also  
tried stopping it after each of a few additional WAL files and  
experienced the same results.  Relevant log lines are as follows:

> Oct 23 22:20:04 db01b postgres[92]: [ID 748848 local0.info]  
> [5699-1] LOG:  restored log file "000000010000001800000088" from  
> archive
> Oct 23 22:20:04 db01b postgres[92]: [ID 748848 local0.info]  
> [5700-1] LOG:  restored log file "000000010000001800000089" from  
> archive
> Oct 23 22:20:04 db01b postgres[92]: [ID 748848 local0.info]  
> [5701-1] LOG:  restored log file "00000001000000180000008A" from  
> archive
> Oct 23 22:20:04 db01b postgres[92]: [ID 748848 local0.info]  
> [5702-1] LOG:  restored log file "00000001000000180000008B" from  
> archive
> Oct 23 22:22:29 db01b postgres[92]: [ID 748848 local0.info]  
> [5703-1] LOG:  could not open file "pg_xlog/ 
> 00000001000000180000008C" (log file 24, segment 140): No such file  
> or directory
> Oct 23 22:22:29 db01b postgres[92]: [ID 748848 local0.info]  
> [5704-1] LOG:  redo done at 18/8B2174D0
> Oct 23 22:22:29 db01b postgres[92]: [ID 748848 local0.info]  
> [5705-1] LOG:  restored log file "00000001000000180000008B" from  
> archive
> Oct 23 22:22:29 db01b postgres[92]: [ID 748848 local0.info]  
> [5706-1] LOG:  archive recovery complete
> Oct 23 22:27:06 db01b postgres[91]: [ID 748848 local0.info] [1-1]  
> LOG:  startup process (PID 92) was terminated by signal 11
> Oct 23 22:27:06 db01b postgres[91]: [ID 748848 local0.info] [2-1]  
> LOG:  aborting startup due to startup process failure

   I checked the original server logs around the times that these WAL  
files were originally archived, but could find no problems being  
reported.  Note that for the sake of absolute consistency, all of my  
tests were done against a pristine restored base backup.

   If any of this doesn't make sense, please let me know and I will  
do my best to explain myself better.  I have been banging my head  
against this for many hours so it is certainly possible that I may,  
unbeknownst to myself, be a bit incoherent at this point.

   Any suggestions?  Thanks.

Cheers,
-craig

---
Craig A. McElroy
Contegix
Beyond Managed Hosting(r) for Your Enterprise


Responses

pgsql-admin by date

Next:From: Tom LaneDate: 2007-10-24 12:12:56
Subject: Re: Problem with PITR Past Particular WAL File
Previous:From: Suresh Gupta VGDate: 2007-10-24 07:11:16
Subject: Re: Postgresql takes more time to update

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group