Errors during recovery of a postgres. Need some help understanding them...

From: "Dhaval Shah" <dhaval(dot)shah(dot)m(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Errors during recovery of a postgres. Need some help understanding them...
Date: 2007-04-10 01:23:07
Message-ID: 565237760704091823v1f5527d3w74f93c1c7fd3040e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Here is the situation:

I have a standby postgres which is fed a WAL File every 2 minutes.
Whenever it is fed a WAL file it logs the following:

---
LOG: restored log file "000000010000000000000070" from archive
pg_restore::copyWALFile: Moving
/opt/data/mirror/000000010000000000000071 to pg_xlog/RECOVERYXLOG
LOG: restored log file "000000010000000000000071" from archive
pg_restore::copyWALFile: Moving
/opt/data/mirror/000000010000000000000072 to pg_xlog/RECOVERYXLOG
LOG: restored log file "000000010000000000000072" from archive
...
...
pg_restore::copyWALFile: Moving
/opt/data/mirror/000000010000000000000082 to pg_xlog/RECOVERYXLOG
LOG: restored log file "000000010000000000000082" from archive
---

I assume that the above situation is a happy postgres in a recovery
mode. The "copyWALFile" is my message in the serverlog.

After a while, the primary gives up. That is it goes down and I am not
able to pull any WAL file from the primary. So I tell the standby that
I do not have any WAL File to give.

----
LOG: could not open file "pg_xlog/000000010000000000000083" (log file
0, segment 131): No such file or directory
LOG: redo done at 0/8200D280
Main: Triggering recovery
PANIC: could not open file "pg_xlog/000000010000000000000082" (log
file 0, segment 130): No such file or directory
---

The issue above is that I do not have the "001...0083" file and I
return a "file not found". Further when the postgres asks me about
"001...0082", I do not have that either, since in the intervening
minutes, I have moved that file out of my /opt/data/mirror to
/opt/data/tape directory for long term tape storage. So how do I make
my standby postgres happy?

Having run into that situation, the standby also spits out the following:

---
LOG: could not open file "pg_xlog/000000010000000000000082" (log file
0, segment 130): No such file or directory
LOG: invalid primary checkpoint record
LOG: could not open file "pg_xlog/000000010000000000000080" (log file
0, segment 128): No such file or directory
LOG: invalid secondary checkpoint record
---

What is happening is that the postgres is looking behind in time for
the "0001...0082" and "0001...0080" files.

The question I have is, how far does it look behind in time? Then I
have to be careful of when I move the WAL file out to tape. Further if
I know how far back in time I have to keep my WAL file, then I can
device an effective strategy of removing older files. That is if I
come and say that I generate WAL file every 2 minutes, then do I keep
10 files or 120 files?

Any insight on this will help.

Regards
Dhaval

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Robert Treat 2007-04-10 01:23:17 Re: Is there a shortage of postgresql skilled ops people
Previous Message Geoffrey 2007-04-10 00:31:42 Re: backend reset of database