8.1.3 Crash/Corruption Problem

From: "Chris Hoover" <revoohc(at)gmail(dot)com>
To: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: 8.1.3 Crash/Corruption Problem
Date: 2007-01-08 16:41:30
Message-ID: 1d219a6f0701080841l3a795a08se3852c23bea2e240@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Over the weekend, I suffered from major database corruption on my
8.1.3database. Fortunetly, this was my development/non-production box
so I did
not have to spend my entire weekend getting it back online, but I did spend
a major portion of it. I ended up having to do a pitr recovery of the
system. I don't understand what happened to the system. Please help
explain what happened. Also, would there have been another option to get
the system back online w/o the full recovery?

Thanks,

Chris

Here is the log of the crash:

,19684,,2007-01-06 03:20:04.542 EST,,70594,,,,DEBUG: server process (PID
9375) exited with exit code 0
,19684,,2007-01-06 03:20:04.547 EST,,70595,,,,DEBUG: forked new backend,
pid=9380 socket=8
[unknown],9380,[unknown],2007-01-06 03:20:04.547 EST,459f5bb4.24a4,1,2007-01-06
03:20:04 EST,0,/usr/bin/postmaster,LOG: connection received: host=[local]
dba_stats,9380,postgres,2007-01-06 03:20:04.547 EST,459f5bb4.24a4,2,2007-01-06
03:20:04 EST,0,authentication,LOG: connection authorized: user=postgres
database=dba_stats
dba_stats,9380,postgres,2007-01-06 03:20:04.549 EST,459f5bb4.24a4,3,2007-01-06
03:20:04 EST,322692053,idle,LOG: statement: insert into db_transactions
values ('192.168.1.73','d45','0','79443','5','1769025','14411089','01/06/2007
03:20:01');
dba_stats,9380,postgres,2007-01-06 03:20:04.551 EST,459f5bb4.24a4,4,2007-01-06
03:20:04 EST,0,INSERT,LOG: duration: 1.974 ms
dba_stats,9380,postgres,2007-01-06 03:20:04.551 EST,459f5bb4.24a4,5,2007-01-06
03:20:04 EST,0,INSERT,LOG: duration: 1.974 ms statement: insert into
db_transactions values
('192.168.1.73','d45','0','79443','5','1769025','14411089','01/06/2007
03:20:01');
dba_stats,9380,postgres,2007-01-06 03:20:04.551 EST,459f5bb4.24a4,6,2007-01-06
03:20:04 EST,0,idle,LOG: disconnection: session time:
0:00:00.00user=postgres database=dba_stats host=[local]
,19684,,2007-01-06 03:20:04.552 EST,,70596,,,,DEBUG: server process (PID
9380) exited with exit code 0
[unknown],9385,[unknown],2007-01-06 03:20:04.557 EST,459f5bb4.24a9,1,2007-01-06
03:20:04 EST,0,/usr/bin/postmaster,LOG: connection received: host=[local]
dba_stats,9385,postgres,2007-01-06 03:20:04.557 EST,459f5bb4.24a9,2,2007-01-06
03:20:04 EST,0,authentication,LOG: connection authorized: user=postgres
database=dba_stats
,19684,,2007-01-06 03:20:04.558 EST,,70597,,,,DEBUG: forked new backend,
pid=9385 socket=8
dba_stats,9385,postgres,2007-01-06 03:20:04.560 EST,459f5bb4.24a9,3,2007-01-06
03:20:04 EST,322692055,idle,LOG: statement: insert into db_transactions
values ('192.168.1.73','d46','0','116796','112','7246902','23447082','01/06/2007
03:20:01');
dba_stats,9385,postgres,2007-01-06 03:20:04.561 EST,459f5bb4.24a9,4,2007-01-06
03:20:04 EST,322692055,INSERT,PANIC: right sibling's left-link doesn't
match
dba_stats,9385,postgres,2007-01-06 03:20:04.561 EST,459f5bb4.24a9,5,2007-01-06
03:20:04 EST,322692055,INSERT,STATEMENT: insert into db_transactions values
('192.168.1.73','d46','0','116796','112','7246902','23447082','01/06/2007
03:20:01');
,19684,,2007-01-06 03:20:04.562 EST,,70598,,,,DEBUG: server process (PID
9385) was terminated by signal 6
,19684,,2007-01-06 03:20:04.562 EST,,70599,,,,LOG: server process (PID
9385) was terminated by signal 6
,19684,,2007-01-06 03:20:04.562 EST,,70600,,,,LOG: terminating any other
active server processes
,19684,,2007-01-06 03:20:04.562 EST,,70601,,,,DEBUG: sending SIGQUIT to
process 8277
,19684,,2007-01-06 03:20:04.562 EST,,70602,,,,DEBUG: sending SIGQUIT to
process 8008
,19684,,2007-01-06 03:20:04.562 EST,,70603,,,,DEBUG: sending SIGQUIT to
process 8227
,19684,,2007-01-06 03:20:04.562 EST,,70604,,,,DEBUG: sending SIGQUIT to
process 8275
sys_logon,8277,postgres,2007-01-06 03:20:04.562 EST,459f5ad5.2055,13,2007-01-06
03:16:21 EST,0,idle,WARNING: terminating connection because of crash of
another server process
sys_logon,8277,postgres,2007-01-06 03:20:04.562 EST,459f5ad5.2055,14,2007-01-06
03:16:21 EST,0,idle,DETAIL: The postmaster has commanded this server
process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.
sys_logon,8277,postgres,2007-01-06 03:20:04.562 EST,459f5ad5.2055,15,2007-01-06
03:16:21 EST,0,idle,HINT: In a moment you should be able to reconnect to
the database and repeat your command.
,19684,,2007-01-06 03:20:04.564 EST,,70605,,,,DEBUG: server process (PID
8277) exited with exit code 1
,19684,,2007-01-06 03:20:04.564 EST,,70606,,,,LOG: all server processes
terminated; reinitializing
,9390,,2007-01-06 03:20:04.603 EST,,1,,,,LOG: database system was
interrupted at 2007-01-06 03:15:04 EST
,9390,,2007-01-06 03:20:04.603 EST,,2,,,,LOG: checkpoint record is at
299/3856B2DC
,9390,,2007-01-06 03:20:04.603 EST,,3,,,,LOG: redo record is at
299/3856B2DC; undo record is at 0/0; shutdown TRUE
,9390,,2007-01-06 03:20:04.603 EST,,4,,,,LOG: next transaction ID:
322691436; next OID: 79454488
,9390,,2007-01-06 03:20:04.603 EST,,5,,,,LOG: next MultiXactId: 9; next
MultiXactOffset: 17
,9390,,2007-01-06 03:20:04.603 EST,,6,,,,LOG: database system was not
properly shut down; automatic recovery in progress
[unknown],9391,[unknown],2007-01-06 03:20:04.604 EST,459f5bb4.24af,1,2007-01-06
03:20:04 EST,0,/usr/bin/postmaster,LOG: connection received: host=[local]
dba_stats,9391,postgres,2007-01-06 03:20:04.604 EST,459f5bb4.24af,2,2007-01-06
03:20:04 EST,0,/usr/bin/postmaster,FATAL: the database system is starting
up
,19684,,2007-01-06 03:20:04.604 EST,,70607,,,,DEBUG: forked new backend,
pid=9391 socket=8
,19684,,2007-01-06 03:20:04.605 EST,,70608,,,,DEBUG: server process (PID
9391) exited with exit code 0
,9390,,2007-01-06 03:20:04.607 EST,,7,,,,LOG: redo starts at 299/3856B320
[unknown],9396,[unknown],2007-01-06 03:20:04.612 EST,459f5bb4.24b4,1,2007-01-06
03:20:04 EST,0,/usr/bin/postmaster,LOG: connection received: host=[local]
dba_stats,9396,postgres,2007-01-06 03:20:04.612 EST,459f5bb4.24b4,2,2007-01-06
03:20:04 EST,0,/usr/bin/postmaster,FATAL: the database system is starting
up
,19684,,2007-01-06 03:20:04.614 EST,,70609,,,,DEBUG: forked new backend,
pid=9396 socket=8
,19684,,2007-01-06 03:20:04.614 EST,,70610,,,,DEBUG: server process (PID
9396) exited with exit code 0
[unknown],9401,[unknown],2007-01-06 03:20:04.620 EST,459f5bb4.24b9,1,2007-01-06
03:20:04 EST,0,/usr/bin/postmaster,LOG: connection received: host=[local]
dba_stats,9401,postgres,2007-01-06 03:20:04.620 EST,459f5bb4.24b9,2,2007-01-06
03:20:04 EST,0,/usr/bin/postmaster,FATAL: the database system is starting
up
,19684,,2007-01-06 03:20:04.622 EST,,70611,,,,DEBUG: forked new backend,
pid=9401 socket=8
,19684,,2007-01-06 03:20:04.622 EST,,70612,,,,DEBUG: server process (PID
9401) exited with exit code 0
,9390,,2007-01-06 03:20:04.622 EST,,8,,,,LOG: record with zero length at
299/386E998C
,9390,,2007-01-06 03:20:04.622 EST,,9,,,,LOG: redo done at 299/386E9964
[unknown],9406,[unknown],2007-01-06 03:20:04.628 EST,459f5bb4.24be,1,2007-01-06
03:20:04 EST,0,/usr/bin/postmaster,LOG: connection received: host=[local]
dba_stats,9406,postgres,2007-01-06 03:20:04.629 EST,459f5bb4.24be,2,2007-01-06
03:20:04 EST,0,/usr/bin/postmaster,FATAL: the database system is starting
up
,19684,,2007-01-06 03:20:04.630 EST,,70613,,,,DEBUG: forked new backend,
pid=9406 socket=8
,19684,,2007-01-06 03:20:04.630 EST,,70614,,,,DEBUG: server process (PID
9406) exited with exit code 0
[unknown],9411,[unknown],2007-01-06 03:20:04.636 EST,459f5bb4.24c3,1,2007-01-06
03:20:04 EST,0,/usr/bin/postmaster,LOG: connection received: host=[local]
dba_stats,9411,postgres,2007-01-06 03:20:04.636 EST,459f5bb4.24c3,2,2007-01-06
03:20:04 EST,0,/usr/bin/postmaster,FATAL: the database system is starting
up
,19684,,2007-01-06 03:20:04.638 EST,,70615,,,,DEBUG: forked new backend,
pid=9411 socket=8
,19684,,2007-01-06 03:20:04.638 EST,,70616,,,,DEBUG: server process (PID
9411) exited with exit code 0
[unknown],9416,[unknown],2007-01-06 03:20:04.643 EST,459f5bb4.24c8,1,2007-01-06
03:20:04 EST,0,/usr/bin/postmaster,LOG: connection received: host=[local]
dba_stats,9416,postgres,2007-01-06 03:20:04.644 EST,459f5bb4.24c8,2,2007-01-06
03:20:04 EST,0,/usr/bin/postmaster,FATAL: the database system is starting
up
,19684,,2007-01-06 03:20:04.645 EST,,70617,,,,DEBUG: forked new backend,
pid=9416 socket=8
,19684,,2007-01-06 03:20:04.645 EST,,70618,,,,DEBUG: server process (PID
9416) exited with exit code 0
[unknown],9421,[unknown],2007-01-06 03:20:04.652 EST,459f5bb4.24cd,1,2007-01-06
03:20:04 EST,0,/usr/bin/postmaster,LOG: connection received: host=[local]
dba_stats,9421,postgres,2007-01-06 03:20:04.652 EST,459f5bb4.24cd,2,2007-01-06
03:20:04 EST,0,/usr/bin/postmaster,FATAL: the database system is starting
up
,19684,,2007-01-06 03:20:04.655 EST,,70619,,,,DEBUG: forked new backend,
pid=9421 socket=8
,19684,,2007-01-06 03:20:04.655 EST,,70620,,,,DEBUG: server process (PID
9421) exited with exit code 0
[unknown],9426,[unknown],2007-01-06 03:20:04.660 EST,459f5bb4.24d2,1,2007-01-06
03:20:04 EST,0,/usr/bin/postmaster,LOG: connection received: host=[local]
dba_stats,9426,postgres,2007-01-06 03:20:04.661 EST,459f5bb4.24d2,2,2007-01-06
03:20:04 EST,0,/usr/bin/postmaster,FATAL: the database system is starting
up
,19684,,2007-01-06 03:20:04.662 EST,,70621,,,,DEBUG: forked new backend,
pid=9426 socket=8
,19684,,2007-01-06 03:20:04.662 EST,,70622,,,,DEBUG: server process (PID
9426) exited with exit code 0
[unknown],9431,[unknown],2007-01-06 03:20:04.668 EST,459f5bb4.24d7,1,2007-01-06
03:20:04 EST,0,/usr/bin/postmaster,LOG: connection received: host=[local]
dba_stats,9431,postgres,2007-01-06 03:20:04.668 EST,459f5bb4.24d7,2,2007-01-06
03:20:04 EST,0,/usr/bin/postmaster,FATAL: the database system is starting
up
,9390,,2007-01-06 03:20:04.669 EST,,10,,,,PANIC: failed to re-find parent
key in "3000108689"
,19684,,2007-01-06 03:20:04.669 EST,,70623,,,,DEBUG: forked new backend,
pid=9431 socket=8
,19684,,2007-01-06 03:20:04.669 EST,,70624,,,,DEBUG: server process (PID
9431) exited with exit code 0
,19684,,2007-01-06 03:20:04.673 EST,,70625,,,,LOG: startup process (PID
9390) was terminated by signal 6
,19684,,2007-01-06 03:20:04.673 EST,,70626,,,,LOG: aborting startup due to
startup process failure
,19685,,2007-01-06 03:20:04.681 EST,,1,,,,LOG: logger shutting down

Here is the log of the first try at starting the database:

,2960,,2007-01-06 08:59:13.996 EST,,1,,,,LOG: database system was
interrupted while in recovery at 2007-01-06 03:20:04 EST
,2960,,2007-01-06 08:59:13.996 EST,,2,,,,HINT: This probably means that
some data is corrupted and you will have to use the last backup for
recovery.
,2960,,2007-01-06 08:59:13.996 EST,,3,,,,LOG: checkpoint record is at
299/3856B2DC
,2960,,2007-01-06 08:59:13.996 EST,,4,,,,LOG: redo record is at
299/3856B2DC; undo record is at 0/0; shutdown TRUE
,2960,,2007-01-06 08:59:13.996 EST,,5,,,,LOG: next transaction ID:
322691436; next OID: 79454488
,2960,,2007-01-06 08:59:13.996 EST,,6,,,,LOG: next MultiXactId: 9; next
MultiXactOffset: 17
,2960,,2007-01-06 08:59:13.996 EST,,7,,,,LOG: database system was not
properly shut down; automatic recovery in progress
,2960,,2007-01-06 08:59:14.192 EST,,8,,,,LOG: redo starts at 299/3856B320
,2960,,2007-01-06 08:59:14.205 EST,,9,,,,LOG: record with zero length at
299/386E998C
,2960,,2007-01-06 08:59:14.205 EST,,10,,,,LOG: redo done at 299/386E9964
,2960,,2007-01-06 08:59:14.243 EST,,11,,,,PANIC: failed to re-find parent
key in "3000108689"
,2956,,2007-01-06 08:59:14.246 EST,,2,,,,LOG: startup process (PID 2960)
was terminated by signal 6
,2956,,2007-01-06 08:59:14.246 EST,,3,,,,LOG: aborting startup due to
startup process failure
,2959,,2007-01-06 08:59:14.254 EST,,1,,,,LOG: logger shutting down

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Tom Lane 2007-01-08 17:25:19 Re: 8.1.3 Crash/Corruption Problem
Previous Message Simon Riggs 2007-01-08 16:28:45 Re: PITR recovery