Re: Unable to start postgres in recovery mode.

From: "Dhaval Shah" <dhaval(dot)shah(dot)m(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Unable to start postgres in recovery mode.
Date: 2007-03-20 23:01:26
Message-ID: 565237760703201601l1dba95flf9c30008377b32f4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks for the email. It helped and after going through the email and
the doc, I realized that the "backup" file had the wrong information,
or rather I had the wrong backup files. That will do the kind of
errors I have seen.

However, I do have one question, I am setting this up as part of the
HA process. The standby is a "hot" standby. Now, if the primary fails
how do I tell the secondary that come out of recovery mode and move
the recovery.conf to recovery.done and start the db. I mean, what
error code shall I return?

If I return a non-numeric error code, I get the following result [from
serverlog]:

====
00000001000000000000001B pg_xlog/RECOVERYXLOG
LOG: restored log file "00000001000000000000001B" from archive
00000001000000000000001C pg_xlog/RECOVERYXLOG
[Main: Triggering Recovery!!!] <---- My script detected that it needs
to trigger recovery...
LOG: could not open file "pg_xlog/00000001000000000000001C" (log file
0, segment 28): No such file or directory
LOG: redo done at 0/1B000070
00000001000000000000001B pg_xlog/RECOVERYXLOG
Main: Triggering Recovery!!! <--- My script is called again and the
script says trigger recovery
PANIC: could not open file "pg_xlog/00000001000000000000001B" (log
file 0, segment 27): No such file or directory
LOG: startup process (PID 32167) was terminated by signal 6
LOG: aborting startup due to startup process failure
====

This is what my script is doing:

if ( triggerRecovery() ) {
print "Main: Triggering Recovery!!! \n";
return 1;
}

So, the question is, on detecting that the primary is down and to
trigger recovery, what error code shall I return? Or do I have to move
the recovery.conf to recovery.done myself and restart the db?

Regards
Dhaval

On 3/20/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Dhaval Shah" <dhaval(dot)shah(dot)m(at)gmail(dot)com> writes:
> > What am I doing wrong?
>
> Lying to the server. If you don't have the requested file, return
> failure, don't invent something. There are a number of cases where
> the recovery process asks for files that are quite likely not to exist.
>
> > If I indicate that I do not have the concerned file by returning error
> > code 1, I get the following error in the log:
>
> This may indicate that you have an incomplete backup :-(. It's hard to
> tell from this much info though. What is in pg_control (use
> pg_controldata to dump) and what is in the backup_label file (that's
> plain text)? What WAL segment files do you actually have?
>
> regards, tom lane
>

--
Dhaval Shah

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Phil Endecott 2007-03-20 23:30:46 Approximate join on timestamps
Previous Message Bruce Momjian 2007-03-20 22:57:53 Re: Bug in CREATE/DROP TABLESPACE command