Re: Postmaster hangs

From: Karen Pease <meme(at)daughtersoftiresias(dot)org>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: Postmaster hangs
Date: 2009-10-26 09:28:44
Message-ID: 1256549324.25178.37.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I did my best to follow the gdb instructions. I ran:

gdb -p 2852

Then connected entered the logging statements, then ran "cont", then
ctrl-c'ed it a couple times. I got:

Program received signal SIGINT, Interrupt.
0x001e6416 in __kernel_vsyscall ()
(gdb) bt
#0 0x001e6416 in __kernel_vsyscall ()
#1 0x00c7939d in ___newselect_nocancel () from /lib/libc.so.6
#2 0x081dbaf9 in ?? ()
#3 0x081dd20a in PostmasterMain ()
#4 0x08190f96 in main ()
(gdb) cont
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x001e6416 in __kernel_vsyscall ()
(gdb) bt
#0 0x001e6416 in __kernel_vsyscall ()
#1 0x00c7939d in ___newselect_nocancel () from /lib/libc.so.6
#2 0x081dbaf9 in ?? ()
#3 0x081dd20a in PostmasterMain ()
#4 0x08190f96 in main ()
(gdb) quit

The jammed httpd processes, by your commandline, are:

[root(at)chmmr dbscripts]# ps ax -o pid,ppid,stat,wchan:50,cmd | grep -i
http
3376 1 D
start_this_handle /usr/sbin/httpd
3379 1 D
start_this_handle /usr/sbin/httpd
3381 1 D
start_this_handle /usr/sbin/httpd
4147 1 D
start_this_handle /usr/sbin/httpd
4539 1 D
start_this_handle /usr/sbin/httpd
5484 1 D
start_this_handle /usr/sbin/httpd
11100 1 D
start_this_handle /usr/sbin/httpd
14882 1 D
start_this_handle /usr/sbin/httpd

These cannot be killed by kill -9. Example:

[root(at)chmmr dbscripts]# kill -9 3376
[root(at)chmmr dbscripts]# ps ax -o pid,ppid,stat,wchan:50,cmd | grep -i
http
3376 1 D
start_this_handle /usr/sbin/httpd
3379 1 D
start_this_handle /usr/sbin/httpd
3381 1 D
start_this_handle /usr/sbin/httpd
4147 1 D
start_this_handle /usr/sbin/httpd
4539 1 D
start_this_handle /usr/sbin/httpd
5484 1 D
start_this_handle /usr/sbin/httpd
11100 1 D
start_this_handle /usr/sbin/httpd
14882 1 D
start_this_handle /usr/sbin/httpd

As mentioned, I can kill postmaster. But I can't restart it without a
reboot; it hangs:

[root(at)chmmr dbscripts]# ps -ef | grep -i postm
postgres 2852 1 0 Oct25 ? 00:00:00 /usr/bin/postmaster -p
5432 -D /var/lib/pgsql/data
root 15115 14844 0 04:23 pts/0 00:00:00 grep -i postm
[root(at)chmmr dbscripts]# /etc/init.d/postgresql stop
Stopping postgresql service: ^C^C [FAILED]
[root(at)chmmr dbscripts]#
[root(at)chmmr dbscripts]# killall -9 postmaster
[root(at)chmmr dbscripts]# ps -ef | grep -i postm
root 15183 14844 0 04:24 pts/0 00:00:00 grep -i postm
[root(at)chmmr dbscripts]# /etc/init.d/postgresql restart
Stopping postgresql service: ^C^C [FAILED]
^C
[root(at)chmmr dbscripts]# /etc/init.d/postgresql start
^C

I have no better luck using pg_ctl directly versus using the postgresql
control script.

Again I hope this helps. Thanks!

- Karen

On Mon, 2009-10-26 at 17:07 +0800, Craig Ringer wrote:
> Karen Pease wrote:
> > kill -9 does kill postmaster (or at least seems to). But I can't figure
> > out a way to get it restarted without a reboot -- I don't know what I'm
> > missing. The Fedora postgres restart scripts don't do the trick, and I
> > couldn't get it to work with pg_ctl either.
>
> It'd help to know where the postmaster was stuck, and if possible where
> the backend you were using is stuck.
>
> A backtrace from gdb can be handy for this.
>
> http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD
>
> > kill -9 doesn't work on the locked up httpd processes. So that has to
> > have the system restarted.
>
> If `kill -9' isn't working they're probably in uninterruptable sleep in
> the kernel.
>
> You can find out what they're sleeping in with `ps':
>
> ps ax -o pid,ppid,stat,wchan:50,cmd
>
> (Filter for just the postmaster and postgres processes if you want)
>
> > Both filesystems are EXT-4.
>
> That's interesting given the issues you're having...
>
> --
> Craig Ringer

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Craig Ringer 2009-10-26 12:27:02 Re: Postmaster hangs
Previous Message Craig Ringer 2009-10-26 09:07:04 Re: Postmaster hangs