Re: Postmaster hangs

From: Karen Pease <meme(at)daughtersoftiresias(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Postmaster hangs
Date: 2009-10-26 03:47:27
Message-ID: 1256528847.25178.25.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

kill -9 does kill postmaster (or at least seems to). But I can't figure
out a way to get it restarted without a reboot -- I don't know what I'm
missing. The Fedora postgres restart scripts don't do the trick, and I
couldn't get it to work with pg_ctl either.

kill -9 doesn't work on the locked up httpd processes. So that has to
have the system restarted.

[meme(at)chmmr]$ cat /proc/version
Linux version 2.6.27.37-170.2.104.fc10.i686
(mockbuild(at)xenbuilder4(dot)fedora(dot)phx(dot)redhat(dot)com) (gcc version 4.3.2
20081105 (Red Hat 4.3.2-7) (GCC) ) #1 SMP Mon Oct 12 22:01:53 EDT 2009

Postgres is by default in /var/lib/pgsql. When / started running out of
space, I moved it to /scratch and symlinked:

lrwxrwxrwx 1 root root 15 2009-09-11 16:57 pgsql
-> /scratch/pgsql//

/ is on md0 and is RAID-1. /scratch is on md1 and is RAID-6:

[meme(at)chmmr]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/md0 64G 42G 18G 71% /
/dev/md1 2.5T 2.2T 239G 91% /scratch
/dev/sdb1 190M 38M 143M 21% /boot
/dev/sde1 190M 86M 95M 48% /boot2
/dev/sdd1 190M 86M 95M 48% /boot3
/dev/sda1 190M 86M 95M 48% /boot4
/dev/sdc1 190M 86M 95M 48% /boot5
tmpfs 1000M 0 1000M 0% /dev/shm
[meme(at)chmmr]$ cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md1 : active raid6 sde4[0] sdc4[4] sda4[3] sdb4[2] sdd4[1]
2722005120 blocks level 6, 64k chunk, algorithm 2 [5/5] [UUUUU]

md0 : active raid1 sde3[0] sdc3[4] sda3[3] sdb3[2] sdd3[1]
67119488 blocks [5/5] [UUUUU]

unused devices: <none>

Both filesystems are EXT-4.

Thanks for your help!

- Karen

On Sun, 2009-10-25 at 23:13 -0400, Tom Lane wrote:
> Karen Pease <meme(at)daughtersoftiresias(dot)org> writes:
> > It'll get through about three or four of them (out of hundreds) before
> > it locks up. Now, before lockup, postmaster is very active. It shows
> > up on top. The computer's hard drives clack nonstop. Etc. But once it
> > locks up (without warning), all of that stop. Postmaster does nothing.
> > The computer goes silent. I can't ctrl-break the psql process. If I
> > try to start a new psql process, it won't get past the password prompt
> > -- psql will hang. All Apache processes involving postgres queries
> > hang. The postgres server cannot be restarted by any normal means (the
> > only solution I've found that works is a reboot). And so forth.
>
> This sounds to me like it's a kernel problem, possibly triggered by
> misbehaving disk hardware. What you might try to confirm is a kill -9
> on whichever postgres backend seems to be stuck. If that fails to
> remove the process, then it's definitely a kernel issue --- try googling
> "uninterruptible disk wait" and similar phrases.
>
> The cases that I've run into personally have been due to poor error
> handling for a disk failure condition in a kernel-level disk driver.
> If that's what it is for you, the bottom-level problem might be an
> unreadable disk block somewhere. Or it might just be a garden variety
> kernel bug. What's the platform?
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Pavel Stehule 2009-10-26 04:43:14 Re: BUG #5136: Please drop the string literal syntax for CREATE FUNCTION ...
Previous Message Tom Lane 2009-10-26 03:13:34 Re: Postmaster hangs