Quick Links

Re: Problems restarting after database crashed (signal

From:	"Scott Marlowe" <smarlowe(at)qwest(dot)net>
To:	"Christopher Cashell" <topher-pgsql(at)zyp(dot)org>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Problems restarting after database crashed (signal
Date:	2004-07-01 05:35:43
Message-ID:	1088660143.4056.12.camel@localhost.localdomain
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Wed, 2004-06-30 at 21:41, Scott Marlowe wrote:
> On Wed, 2004-06-30 at 18:57, Christopher Cashell wrote:
> > Yesterday, while attempting to access a database, I received errors
> > saying that the database was innaccessible. After investigating a
> > little, I found the following in the PostgreSQL log files:
> >
> > 2004-06-30 08:30:19 [24119] LOG: checkpoint process (PID 28423) was
> > terminated by signal 11
>
> > Eventually I attempted to shut it down and restart it, however that
> > failed too. When I attempted to shut it down, I discovered a hung
> > 'startup subprocess' that can't be killed.
> >
> > nexus:~# ps aux | grep postgres
> > postgres 28424 0.0 1.5 16804 3044 pts/313 D 08:35 0:06 postgres:
> > startup subprocess
> > nexus:~# kill -9 28424
> > nexus:~# ps aux | grep postgres
> > postgres 28424 0.0 1.5 16804 3044 pts/313 D 08:35 0:06 postgres:
> > startup subprocess
> > nexus:~#
>
> The combination of a Sig 11 failure and a process stuck in a D state
> makes me lean towards thinking it's bad hardware (CPU or memory). Have
> you tested this machine?

Oh, and a possibly buggy kernel or kernel module somewhere as well.
Didn't mean to not say it, and have had problems with some kernels under
heavy parallel loads doing stupid things that look just like this.

In response to

Re: Problems restarting after database crashed (signal at 2004-07-01 03:41:34 from Scott Marlowe

Browse pgsql-general by date

	From	Date	Subject
Next Message	Christopher Cashell	2004-07-01 07:26:25	Re: Problems restarting after database crashed (signal 11).
Previous Message	Tom Lane	2004-07-01 05:00:29	Re: case for lock_timeout