BUG #5452: Server core dumps coming out of recovery mode

From: "Chris Copeland" <chris(at)cope360(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #5452: Server core dumps coming out of recovery mode
Date: 2010-05-06 16:24:43
Message-ID: 201005061624.o46GOhkY015782@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 5452
Logged by: Chris Copeland
Email address: chris(at)cope360(dot)com
PostgreSQL version: 8.2.16
Operating system: Solaris 10 x86
Description: Server core dumps coming out of recovery mode
Details:

I originally had this problem in 8.2.4 and have not upgraded to 8.2.16 and I
am still having it.

I have two servers with the same hardware, OS, and pg binaries. Log files
are copied from the master to the standby and the standby is run in recovery
mode.

When the standby is triggered to come out of recovery mode, it fails and
generates a core dump. Upon trying to start it after failure, it starts
looking for WAL files that it has already recovered.

Here are the log files. My annotations start with >>

::::::::::::::
postgresql-2010-05-06_102646.log
::::::::::::::
>> First time starting the server after the data directory was copied from
the master
2010-05-06 10:26:46 CDT :LOG: database system was interrupted at 2010-05-06
09:32:20 CDT
2010-05-06 10:26:46 CDT :LOG: starting archive recovery
2010-05-06 10:26:46 CDT :LOG: restore_command = "/opt/data/restore.sh
/opt/wal/archwalremote/%f %p"
2010-05-06 10:26:46 CDT :LOG: restored log file
"00000001000000AF00000048.00000020.backup" from archive
2010-05-06 10:26:47 CDT :LOG: restored log file "00000001000000AF00000048"
from archive
2010-05-06 10:26:47 CDT :LOG: checkpoint record is at AF/48000020
2010-05-06 10:26:47 CDT :LOG: redo record is at AF/48000020; undo record is
at 0/0; shutdown FALSE
2010-05-06 10:26:47 CDT :LOG: next transaction ID: 0/326126241; next OID:
862575
2010-05-06 10:26:47 CDT :LOG: next MultiXactId: 2128; next MultiXactOffset:
4280
2010-05-06 10:26:47 CDT :LOG: automatic recovery in progress
2010-05-06 10:26:47 CDT :LOG: redo starts at AF/48000068
2010-05-06 10:26:48 CDT :LOG: restored log file "00000001000000AF00000049"
from archive
2010-05-06 10:26:50 CDT :LOG: restored log file "00000001000000AF0000004A"
from archive
2010-05-06 10:26:51 CDT :LOG: restored log file "00000001000000AF0000004B"
from archive
2010-05-06 10:26:52 CDT :LOG: restored log file "00000001000000AF0000004C"
from archive
2010-05-06 10:26:54 CDT :LOG: restored log file "00000001000000AF0000004D"
from archive
2010-05-06 10:26:56 CDT :LOG: restored log file "00000001000000AF0000004E"
from archive
2010-05-06 10:26:58 CDT :LOG: restored log file "00000001000000AF0000004F"
from archive
2010-05-06 10:27:00 CDT :LOG: restored log file "00000001000000AF00000050"
from archive
2010-05-06 10:27:02 CDT :LOG: restored log file "00000001000000AF00000051"
from archive
2010-05-06 10:27:04 CDT :LOG: restored log file "00000001000000AF00000052"
from archive
2010-05-06 10:27:24 CDT :LOG: restored log file "00000001000000AF00000053"
from archive
>> WAL files now being restored as they arrive from the master
2010-05-06 10:32:25 CDT :LOG: restored log file "00000001000000AF00000054"
from archive
2010-05-06 10:37:26 CDT :LOG: restored log file "00000001000000AF00000055"
from archive
2010-05-06 10:42:27 CDT :LOG: restored log file "00000001000000AF00000056"
from archive
2010-05-06 10:47:28 CDT :LOG: restored log file "00000001000000AF00000057"
from archive
2010-05-06 10:52:29 CDT :LOG: restored log file "00000001000000AF00000058"
from archive
2010-05-06 10:57:30 CDT :LOG: restored log file "00000001000000AF00000059"
from archive
>> Now I trigger the restore command to return 1 to stop the recovery
2010-05-06 10:59:30 CDT :LOG: could not open file
"pg_xlog/00000001000000AF0000005A" (log file 175, segment 90): No such file
or directory
2010-05-06 10:59:30 CDT :LOG: redo done at AF/59000068
2010-05-06 10:59:30 CDT :PANIC: could not open file
"pg_xlog/00000001000000AF00000059" (log file 175, segment 89): No such file
or directory
2010-05-06 10:59:34 CDT :LOG: startup process (PID 24763) was terminated by
signal 6
2010-05-06 10:59:34 CDT :LOG: aborting startup due to startup process
failure
2010-05-06 10:59:34 CDT :LOG: logger shutting down
::::::::::::::
postgresql-2010-05-06_105934.log
::::::::::::::
>> Attempting to restart server
2010-05-06 10:59:34 CDT :LOG: database system was interrupted while in
recovery at log time 2010-05-06 10:57:20 CDT
2010-05-06 10:59:34 CDT :HINT: If this has occurred more than once some
data may be corrupted and you may need to choose an earlier recovery
target.
2010-05-06 10:59:34 CDT :LOG: starting archive recovery
2010-05-06 10:59:34 CDT :LOG: restore_command = "/opt/data/restore.sh
/opt/wal/archwalremote/%f %p"
2010-05-06 10:59:34 CDT :LOG: could not open file
"pg_xlog/00000001000000AF00000059" (log file 175, segment 89): No such file
or directory
2010-05-06 10:59:34 CDT :LOG: invalid primary checkpoint record
2010-05-06 10:59:34 CDT :LOG: could not open file
"pg_xlog/00000001000000AF00000058" (log file 175, segment 88): No such file
or directory
2010-05-06 10:59:34 CDT :LOG: invalid secondary checkpoint record
2010-05-06 10:59:34 CDT :PANIC: could not locate a valid checkpoint record
2010-05-06 10:59:38 CDT :LOG: startup process (PID 29450) was terminated by
signal 6
2010-05-06 10:59:38 CDT :LOG: aborting startup due to startup process
failure
2010-05-06 10:59:38 CDT :LOG: logger shutting down

Here is some information from the core file. I don't know if this is useful
since this is not a debug build.

----------------------

bsiausstgdb02:/opt/data root# pflags core
core 'core' of 29639: /usr/local/pgsql-8.2.16/bin/postgres -D /opt/data
data model = _ILP32 flags = ORPHAN|MSACCT|MSFORK
/1: flags = 0
sigmask = 0xffffbefc,0x0000ffff cursig = SIGABRT

----------------------

bsiausstgdb02:/opt/data root# mdb core
mdb: core file data for mapping at cbc00000 not saved: Interrupted system
call
mdb: core file data for mapping at ce4b5000 not saved: Interrupted system
call
mdb: core file data for mapping at ce4c0000 not saved: Interrupted system
call
mdb: core file data for mapping at ce4e0000 not saved: Interrupted system
call
mdb: core file data for mapping at ce500000 not saved: Interrupted system
call
mdb: core file data for mapping at ce520000 not saved: Interrupted system
call
mdb: core file data for mapping at ce530000 not saved: Interrupted system
call
mdb: core file data for mapping at ce5c1000 not saved: Interrupted system
call
mdb: core file data for mapping at ce5c6000 not saved: Interrupted system
call
mdb: core file data for mapping at ce5d0000 not saved: Interrupted system
call
mdb: core file data for mapping at ce5ee000 not saved: Interrupted system
call
mdb: core file data for mapping at ce600000 not saved: Interrupted system
call
mdb: core file data for mapping at ce617000 not saved: Interrupted system
call
mdb: core file data for mapping at ce618000 not saved: Interrupted system
call
mdb: core file data for mapping at ce620000 not saved: Interrupted system
call
mdb: core file data for mapping at ce630000 not saved: Interrupted system
call
mdb: core file data for mapping at ce6ff000 not saved: Interrupted system
call
mdb: core file data for mapping at ce706000 not saved: Interrupted system
call
mdb: core file data for mapping at ce710000 not saved: Interrupted system
call
mdb: core file data for mapping at ce762000 not saved: Interrupted system
call
mdb: core file data for mapping at ce770000 not saved: Interrupted system
call
mdb: core file data for mapping at ce78b000 not saved: Interrupted system
call
mdb: core file data for mapping at ce790000 not saved: Interrupted system
call
mdb: core file data for mapping at ce7a6000 not saved: Interrupted system
call
mdb: core file data for mapping at ce7b0000 not saved: Interrupted system
call
mdb: core file data for mapping at ce7c0000 not saved: Interrupted system
call
mdb: core file data for mapping at ce7c4000 not saved: Interrupted system
call
mdb: core file data for mapping at ce7fb000 not saved: Interrupted system
call
mdb: core file data for mapping at ce7fc000 not saved: Interrupted system
call
mdb: warning: librtld_db failed to initialize; shared library information
will not be available
Loading modules: [ ld.so.1 ]
> ::stack
0xce6d5f85(1, 6, 8329ba0, ce6ff000, 8044618, ce6615e9)
0xce6811cb(6, 8329ba0, 0, 0, 0, 0)
0xce6615e9(832d4d0, 16, 8044648, 821ae3b, 0, 0)
0x821b032(0, 82a1510, 12d8, 823f077)
StartupXLOG+0x1122(2, 0, 2, 8339060, 82a9411, 833c840)
BootstrapMain+0x46a(4, 8047880, 82a941a, 2, 32782d, ce7c0d50)
0x818aa17(1f, 1, 0, d, 0, 0)
PostmasterMain+0x633(3, 8338d90, 1, 5a, 8047dec)
main+0x396(3, 8338d90, 8047e1c)
_start+0x80(3, 8047ea8, 8047fd0, 8047fd0, 0, 8047eda)
>

----------------------

If I can provide more information, please let me know and I will get it for
you.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Pavel Golub 2010-05-07 05:51:25 Re: BUG #5451: pg_restore doesn't close input .tar archive
Previous Message Tom Lane 2010-05-06 15:05:26 Re: BUG #5451: pg_restore doesn't close input .tar archive