Skip site navigation (1) Skip section navigation (2)

Re: unable to fail over to warm standby server

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Mason Hale <mason(at)onespot(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-bugs(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: unable to fail over to warm standby server
Date: 2010-01-29 16:02:26
Message-ID: 3f0b79eb1001290802p56af2093t10a77b82f36bc5bf@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-bugs
On Fri, Jan 29, 2010 at 11:49 PM, Mason Hale <mason(at)onespot(dot)com> wrote:
> While I did not remove the trigger file, I did rename recovery.conf to
> recovery.conf.old.
> That file contained the recovery_command configuration that identified the
> trigger file. So that rename should have eliminated the problem. But it
> didn't. Even after making this change and taking the trigger file out of the
> equation my database failed to come online.

Renaming of the recovery.conf doesn't resolve the problem at all. Instead,
the sysadmin had to remove only the trigger file with a wrong permission
and just restart postgres.

>> 9.) The server did not come up (again). This time the contents of the
>> new postgresql.log file were:
>>
>> [postgres(at)prod-db-2 pg_log]$ tail -n 100 postgresql-2010-01-18_211132.log
>> 2010-01-18 21:11:32 UTC ()LOG:  database system was interrupted while in recovery at log time 2010-01-18 20:10:59 UTC
>> 2010-01-18 21:11:32 UTC ()HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
>> 2010-01-18 21:11:32 UTC ()LOG:  could not open file "pg_xlog/0000000200003C82000000A3" (log file 15490, segment 163): No such file or directory
>> 2010-01-18 21:11:32 UTC ()LOG:  invalid primary checkpoint record
>> 2010-01-18 21:11:32 UTC ()LOG:  could not open file "pg_xlog/0000000200003C8200000049" (log file 15490, segment 73): No such file or directory
>> 2010-01-18 21:11:32 UTC ()LOG:  invalid secondary checkpoint record
>> 2010-01-18 21:11:32 UTC ()PANIC:  could not locate a valid checkpoint record
>> 2010-01-18 21:11:32 UTC ()LOG:  startup process (PID 9328) was terminated by signal 6: Aborted
>> 2010-01-18 21:11:32 UTC ()LOG:  aborting startup due to startup process failure

You seem to focus on the above trouble. I think that this happened because
recovery.conf was deleted and restore_command was not given. In fact, the
WAL file (e.g., pg_xlog/0000000200003C82000000A3) required for recovery
was unable to be restored from the archive because restore_command was
not supplied. Then recovery failed.

If the sysadmin had left the recovery.conf and removed the trigger file,
pg_standby in restore_command would have restored all WAL files required
for recovery, and recovery would advance well.

Hope this helps.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

pgsql-bugs by date

Next:From: Jehan-Guillaume (ioguix) de RorthaisDate: 2010-01-29 16:07:17
Subject: BUG #5301: difference of behaviour between 8.3 and 8.4 on IS NULL with sub rows of nulls
Previous:From: Mason HaleDate: 2010-01-29 14:49:51
Subject: Re: unable to fail over to warm standby server

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group