Re: Problems restarting after database crashed (signal

From: "Scott Marlowe" <smarlowe(at)qwest(dot)net>
To: "Christopher Cashell" <topher-pgsql(at)zyp(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Problems restarting after database crashed (signal
Date: 2004-07-01 05:35:43
Message-ID: 1088660143.4056.12.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, 2004-06-30 at 21:41, Scott Marlowe wrote:
> On Wed, 2004-06-30 at 18:57, Christopher Cashell wrote:
> > Yesterday, while attempting to access a database, I received errors
> > saying that the database was innaccessible. After investigating a
> > little, I found the following in the PostgreSQL log files:
> >
> > 2004-06-30 08:30:19 [24119] LOG: checkpoint process (PID 28423) was
> > terminated by signal 11
>
> > Eventually I attempted to shut it down and restart it, however that
> > failed too. When I attempted to shut it down, I discovered a hung
> > 'startup subprocess' that can't be killed.
> >
> > nexus:~# ps aux | grep postgres
> > postgres 28424 0.0 1.5 16804 3044 pts/313 D 08:35 0:06 postgres:
> > startup subprocess
> > nexus:~# kill -9 28424
> > nexus:~# ps aux | grep postgres
> > postgres 28424 0.0 1.5 16804 3044 pts/313 D 08:35 0:06 postgres:
> > startup subprocess
> > nexus:~#
>
> The combination of a Sig 11 failure and a process stuck in a D state
> makes me lean towards thinking it's bad hardware (CPU or memory). Have
> you tested this machine?

Oh, and a possibly buggy kernel or kernel module somewhere as well.
Didn't mean to not say it, and have had problems with some kernels under
heavy parallel loads doing stupid things that look just like this.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Christopher Cashell 2004-07-01 07:26:25 Re: Problems restarting after database crashed (signal 11).
Previous Message Tom Lane 2004-07-01 05:00:29 Re: case for lock_timeout