Re: Problems restarting after database crashed (signal 11).

From: Christopher Cashell <topher-pgsql(at)zyp(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Problems restarting after database crashed (signal 11).
Date: 2004-07-01 02:37:58
Message-ID: 20040701023758.GB30122@zyp.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

At Wed, 30 Jun 04, Unidentified Flying Banana Tom Lane, said:
> Christopher Cashell <topher-pgsql(at)zyp(dot)org> writes:
> > Eventually I attempted to shut it down and restart it, however that
> > failed too. When I attempted to shut it down, I discovered a hung
> > 'startup subprocess' that can't be killed.
>
> This is interesting because it seems just about exactly like this
> recent Red Hat bug report:
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=126885

Hrm. Yes, it does appear to be a very similar, if not identical, issue.

> As I commented there, I think that it must be a kernel or hardware
> issue --- Postgres itself can surely not make an unkillable process.
> However it's common to see processes that don't respond to kill if
> they are stuck inside a kernel I/O request. That could mean either
> unresponsive hardware or a kernel bug.

That is somewhat along the lines of what I was thinking, although I have
had no problems like this before. The machine has been running for over
100 days, and the database as well, without issue.

28424 postgres 18 0 16804 3044 15m D 0.0 1.6 0:06.72 postmaster

Note that it does have a process status of 'D', or uninterruptible
sleep. That would explain the unkillable part, though I'm curious how
it ended up there. Unless it just happened to be in a really bad spot
when Posgres segfaulted. . . although, I wouldn't expect that would
affect the 'startup subprocess'.

> I wonder whether you have any similarities in hardware or Linux kernel
> to the person who filed the above report?

Here's all the information I can provide for this machine:

IBM IntelliStation Z Pro
Model: 6899-12U
Dual Pentium Pro 200
192MB RAM
4.5 GB IBM SCSI HDD
9 GB IBM SCSI HDD
6.4 GB WD HDD

The database resides on the 4.5 GB SCSI, with the pg_xlog directory
symlinked from there, and actually existing on the 9GB SCSI.

nexus:~$ uname -a
Linux nexus.zyp.org 2.6.4 #1 SMP Thu Mar 11 14:04:49 CST 2004 i686 GNU/Linux
nexus:~$ uptime
21:15:39 up 107 days, 20:57, 7 users, load average: 2.04, 2.31, 2.38

If there's any other information I can provide, please let me know.

I'm going to reboot the box right now, and cross my fingers, hoping
it'll come back up. ;-)

> regards, tom lane

--
| Christopher
+------------------------------------------------+
| Here I stand. I can do no other. |
+------------------------------------------------+

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bill Moran 2004-07-01 03:08:13 Re: Dump / restore for optimization?
Previous Message Tom Lane 2004-07-01 01:54:26 Re: Problems restarting after database crashed (signal 11).