Problems restarting after database crashed (signal 11).

From: Christopher Cashell <topher-pgsql(at)zyp(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Problems restarting after database crashed (signal 11).
Date: 2004-07-01 00:57:35
Message-ID: 20040701005735.GA30122@zyp.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Yesterday, while attempting to access a database, I received errors
saying that the database was innaccessible. After investigating a
little, I found the following in the PostgreSQL log files:

2004-06-30 08:30:19 [24119] LOG: checkpoint process (PID 28423) was
terminated by signal 11
2004-06-30 08:30:19 [24119] LOG: terminating any other active server
processes
2004-06-30 08:30:19 [28383] WARNING: terminating connection because of
crash of another server process
DETAIL: The postmaster has commanded this server process to roll back
the curre nt transaction and exit, because another server process exited
abnormally and po ssibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat yo ur command.
2004-06-30 08:30:19 [28362] WARNING: terminating connection because of
crash of another server process
DETAIL: The postmaster has commanded this server process to roll back
the curre nt transaction and exit, because another server process exited
abnormally and po ssibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat yo ur command.

The last bit then repeated a few more times, and then:

2004-06-30 08:30:20 [24119] LOG: all server processes terminated;
reinitializing
2004-06-30 08:30:20 [28424] LOG: database system was interrupted at 2004-06-30
08:22:23 CDT
2004-06-30 08:30:20 [28424] LOG: checkpoint record is at 8/77703F9C
2004-06-30 08:30:20 [28424] LOG: redo record is at 8/775B1D38; undo
record is at 0/0; shutdown FALSE
2004-06-30 08:30:20 [28424] LOG: next transaction ID: 1638554; next
OID: 1058492
2004-06-30 08:30:20 [28424] LOG: database system was not properly shut
down; automatic recovery in progress
2004-06-30 08:30:20 [28424] LOG: redo starts at 8/775B1D38
2004-06-30 08:30:21 [28430] LOG: connection received: host=[local] port=
2004-06-30 08:30:21 [28430] FATAL: the database system is starting up
2004-06-30 08:30:38 [28424] LOG: record with zero length at 8/78855F38
2004-06-30 08:30:38 [28424] LOG: redo done at 8/78853EE0
2004-06-30 08:31:40 [28449] LOG: connection received: host=[local] port=
2004-06-30 08:31:40 [28449] FATAL: the database system is starting up
2004-06-30 08:31:48 [28452] LOG: connection received: host=[local] port=
2004-06-30 08:31:48 [28452] FATAL: the database system is starting up
2004-06-30 08:31:53 [28459] LOG: connection received: host=[local] port=
2004-06-30 08:31:53 [28459] FATAL: the database system is starting up

And this then continues on and on. Even 20 minutes later, attempts to
connect to the database were met with the same FATAL error.

Eventually I attempted to shut it down and restart it, however that
failed too. When I attempted to shut it down, I discovered a hung
'startup subprocess' that can't be killed.

nexus:~# ps aux | grep postgres
postgres 28424 0.0 1.5 16804 3044 pts/313 D 08:35 0:06 postgres:
startup subprocess
nexus:~# kill -9 28424
nexus:~# ps aux | grep postgres
postgres 28424 0.0 1.5 16804 3044 pts/313 D 08:35 0:06 postgres:
startup subprocess
nexus:~#

As soon as I can get physical access to the machine, I'm planning to
reboot it, as I can't think of anything else to do to kill a process
that can't be kill -KILL'ed.

I'm worried that attempting to start the database after rebooting will
fail in the same way, however. Has anyone seen anything like this
before, or have any ideas on how to proceed?

I'm running on an Intel Pentium Pro box, with Debian/GNU Linux, running
'unstable'. I'm using PostgreSQL 7.4.3.

Thank you for your help.

--
| Christopher
+------------------------------------------------+
| Here I stand. I can do no other. |
+------------------------------------------------+

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2004-07-01 01:05:35 Re: Slow dump with pg_dump/pg_restore ? How to improve ?
Previous Message Otto Blomqvist 2004-07-01 00:11:13 pg_dump and pg_restore problems