Re: BUG #4854: Problems with replaying WAL files on Warm Standby

From: Keith Pierno <kpierno(at)lulu(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4854: Problems with replaying WAL files on Warm Standby
Date: 2009-06-16 13:17:14
Message-ID: 4A379B5A.2020900@lulu.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
The timeline for the events all dates MM/DD/YYYY<br>
<br>
    06/09/2009 1310 EDT - Hardware fault on primary database server
db01pri<br>
    06/09/2009 1325 EDT - Failover to warm standby db01sec<br>
    06/12/2009 1615 EDT - db01pri server fixed and OS booted<br>
    06/15/2009 1115 EDT - started recovery of hotbackup from 06/15/2009
0205 EDT from db01sec onto db01pri<br>
    06/15/2009 1320 EDT - Attempted to start postgres on db01pri in
warm standby mode<br>
    06/15/2009 1325 EDT - Failure to apply WAL log errors with
"unexpected timeline ID"<br>
    06/15/2009 1340 EDT - Started a new hotbackup on db01sec<br>
    06/15/2009 1545 EDT - Started recovery hotbackup from 06/15/2009
1340 to db01pri<br>
    06/15/2000 1430 EDT - db01pri recovered and running in warm standby<br>
<br>
Here is the contents of the pg_xlog directory and the 00000004.history
file:<br>
<br>
[postgres(at)db01pri ~]$  cat 00000004.history <br>
1    0000000100000736000000A1    before transaction 0 at 1999-12-31
19:00:00-05<br>
[postgres(at)db01pri ~]$  ls -l <br>
total 98468<br>
-rw-------  1 postgres postgres       74 Jul 10  2008 00000002.history<br>
-rw-------  1 postgres postgres       74 Jun  9 13:29 00000003.history<br>
-rw-------  1 postgres postgres 16777216 Jun 16 08:45
0000000400000749000000C9<br>
-rw-------  1 postgres postgres 16777216 Jun 16 08:46
0000000400000749000000CA<br>
-rw-------  1 postgres postgres 16777216 Jun 16 08:47
0000000400000749000000CB<br>
-rw-------  1 postgres postgres       74 Jun  9 13:33 00000004.history<br>
drwxr-xr-x  2 postgres postgres    32768 Jun 16 08:46 archive_status<br>
-rw-------  1 postgres postgres 16777216 Jun  9 13:45 xlogtemp.17243<br>
-rw-------  1 postgres postgres 16777216 Jun  9 13:45 xlogtemp.17244<br>
-rw-------  1 postgres postgres 16777216 Jun  9 13:52 xlogtemp.17397<br>
[postgres(at)db01pri ~]$ <br>
<br>
Thanks again,<br>
<br>
Keith<br>
<br>
Tom Lane wrote:
<blockquote cite="mid:27715(dot)1245108907(at)sss(dot)pgh(dot)pa(dot)us" type="cite">
<pre wrap="">Keith Pierno <a class="moz-txt-link-rfc2396E" href="mailto:kpierno(at)lulu(dot)com">&lt;kpierno(at)lulu(dot)com&gt;</a> writes:
</pre>
<blockquote type="cite">
<pre wrap="">The backup used was from well after the failover time which is why I
was concerned. Interestingly enough the logs are still all prefixed
with 00000004... That just makes this problem extremely bizarre.
</pre>
</blockquote>
<pre wrap=""><!---->
Hmm, that *is* weird. It seems like the new primary must have reverted
its decision to go from timeline 4 to timeline 6. (Which in itself is
a bit odd; why not timeline 5?)

Can you give us an exact sequence of events on the slave server/new
primary around the time of the failover? Also, what was in the .history
file when you found it, and are there any other history files?

regards, tom lane
</pre>
</blockquote>
<br>
</body>
</html>

Attachment Content-Type Size
unknown_filename text/html 3.1 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2009-06-16 14:00:21 Re: BUG #4855: Explain errors on drop table if exists
Previous Message David Tandwe 2009-06-16 08:13:30 PGOLEDB Error Fillinvalues failed