Fwd: Problems waking up from a warm standby

From: "Ori Garin" <garin(at)textkernel(dot)nl>
To: pgsql-admin(at)postgresql(dot)org
Subject: Fwd: Problems waking up from a warm standby
Date: 2008-10-30 16:58:55
Message-ID: 9c64d57f0810300958l7f3465e8nb4724ad291fa4d47@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi.

I have a problem with a standby server running on Windows 2003 R2,
Enterprise x64 edition. I use Postgres 8.3 (installed to C:\Program Files
(x86)\ )
Everything was working fine (base backup, archiving, recovery), until I
wanted to test failover.

To do that I create a trigger file, the recovery command returns a nonzero
code, and I get something like:

2008-10-29 16:39:23 CET LOG: restored log file "0000000100000011000000CF"
from archive
2008-10-29 16:42:25 CET LOG: restored log file "0000000100000011000000D0"
from archive
2008-10-29 16:45:56 CET LOG: restored log file "0000000100000011000000D1"
from archive
2008-10-29 16:49:02 CET LOG: restored log file "0000000100000011000000D2"
from archive
2008-10-29 16:58:45 CET LOG: could not open file
"pg_xlog/0000000100000011000000D3" (log file 17, segment 211): No such file
or directory

Now postgres seems to be stuck, or dead. Using Process Explorer I see that
one of the postgres.exe processes is running drwtsn32.exe (Dr Watson
Postmortem debugger) indefinitely. If I kill drwtsn32, postgres dies too:

2008-10-29 16:58:45 CET LOG: could not open file
"pg_xlog/0000000100000011000000D3" (log file 17, segment 211): No such file
or directory
2008-10-29 17:20:27 CET LOG: startup process (PID 2952) was terminated by
exception 0xC000000D
2008-10-29 17:20:27 CET HINT: See C include file "ntstatus.h" for a
description of the hexadecimal value.
2008-10-29 17:20:27 CET LOG: aborting startup due to startup process
failure

At first I was thinking that the message "could not open file..." is the
problem, cause I figured postgres must think that the recovery command
succeeded, otherwise it wouldn't try to use the new WAL file, but I'm not
sure anymore if that's true.
As the recovery command, I've used my own perl script (compiled), switched
to pg_standby, and then reverted back to the script. Nothing works, even
killing the restore_command process causes the same problem.

I just noticed an Application Error in the Event Log:

Event Type: Error
Event Source: Application Error
Event Category: (100)
Event ID: 1000
Date: 29-10-2008
Time: 16:58:47
User: N/A
Computer: S1217
Description:
Faulting application postgres.exe, version 8.3.1.876, faulting module
msvcr80.dll, version 8.0.50727.762, fault address 0x0001e879.

For more information, see Help and Support Center at
http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 41 70 70 6c 69 63 61 74 Applicat
0008: 69 6f 6e 20 46 61 69 6c ion Fail
0010: 75 72 65 20 20 70 6f 73 ure pos
0018: 74 67 72 65 73 2e 65 78 tgres.ex
0020: 65 20 38 2e 33 2e 31 2e e 8.3.1.
0028: 38 37 36 20 69 6e 20 6d 876 in m
0030: 73 76 63 72 38 30 2e 64 svcr80.d
0038: 6c 6c 20 38 2e 30 2e 35 ll 8.0.5
0040: 30 37 32 37 2e 37 36 32 0727.762
0048: 20 61 74 20 6f 66 66 73 at offs
0050: 65 74 20 30 30 30 31 65 et 0001e
0058: 38 37 39 879

Does anyone has any ideas about this??
Thanks in advance!

Ori

Browse pgsql-admin by date

  From Date Subject
Next Message Jaume Sabater 2008-10-30 17:30:45 Re: Deleting old archived WAL files
Previous Message Chander Ganesan 2008-10-30 16:46:05 Re: Deleting old archived WAL files