Re: Postmaster hangs

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Karen Pease <meme(at)daughtersoftiresias(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: Postmaster hangs
Date: 2009-10-27 06:24:24
Message-ID: 1256624664.1709.80.camel@wallace.localnet
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, 2009-10-27 at 00:50 -0500, Karen Pease wrote:
> > OK, so there's nothing shrieklingly obviously wrong with what the
> > postmaster is up to. But what about the backend that's stopped
> > responding? Try connecting gdb to that "postgres" process once it's
> > stopped responding and get a backtrace from that.
> >
>
> Okay -- I started up a psql instance, which immediately locks up. I
> then attached gdb to it and got this:
>
> (gdb) cont
> Continuing.
> ^C
> Program received signal SIGINT, Interrupt.
> 0x00fe2416 in __kernel_vsyscall ()

You didn't actually request a backtrace (bt), so all it shows is the top
stack frame. That doesn't tell us anything except that it's busy in a
system call in the kernel.

> > You can find out a bit more about what the kernel is doing using the
> > "magic" keyboard sequence "ALT-SysRQ-T" from a vconsole (not under X).
>
> Nothing happened. Nothing useful in dmesg -- certainly no stacktraces.

Your kernel might not have the "magic sysrq key" enabled. Run:

sudo sysctl -w kernel.sysrq=1

and try again. Note that on some systems with weird keyboards you might
have to hold the "Fn" key (if you have one) or disable "F-Lock" (if you
have it) to get SysRq to be recognised. The print screen / PrtScn key is
usually shared with SysRq even if it's not marked as such.

Hmm, it looks like the SysRq magic key sequences even work under X. I
didn't think they did, but on my system here hitting alt-sysrq-t under
X11 dumps a bunch of task trace data in to /var/log/kern.log (Ubuntu
system), including task info and some general info on the CPU states.

Oct 27 14:20:19 wallace kernel: [13668207.501781] postgres S 28e389e2 0 3152 31105
Oct 27 14:20:19 wallace kernel: [13668207.501781] f352bcac 00200086 c3634358 28e389e2 00029346 00000000 c069a340 c07d2180
Oct 27 14:20:19 wallace kernel: [13668207.501781] c51fe480 c51fe6f8 c3634300 ffffffff c3634300 00000000 c3634300 c51fe6f8
Oct 27 14:20:19 wallace kernel: [13668207.501781] f352bca8 c01398c8 c90e0000 7fffffff c90e01a8 f352bcf8 c050ec7d f352bcc8
Oct 27 14:20:19 wallace kernel: [13668207.501781] Call Trace:
Oct 27 14:20:19 wallace kernel: [13668207.501781] [<c01398c8>] ? enqueue_task_fair+0x68/0x70
Oct 27 14:20:19 wallace kernel: [13668207.501781] [<c050ec7d>] schedule_timeout+0xad/0xe0
Oct 27 14:20:19 wallace kernel: [13668207.501781] [<c0156aea>] ? prepare_to_wait+0x3a/0x70
Oct 27 14:20:19 wallace kernel: [13668207.501781] [<c04aaf38>] unix_stream_data_wait+0x88/0xe0
Oct 27 14:20:19 wallace kernel: [13668207.501781] [<c0156890>] ? autoremove_wake_function+0x0/0x50
Oct 27 14:20:19 wallace kernel: [13668207.501781] [<c04ac631>] unix_stream_recvmsg+0x311/0x490
Oct 27 14:20:19 wallace kernel: [13668207.501781] [<c02b0990>] ? apparmor_socket_recvmsg+0x10/0x20
Oct 27 14:20:19 wallace kernel: [13668207.501781] [<c04356e4>] sock_recvmsg+0xf4/0x120
Oct 27 14:20:19 wallace kernel: [13668207.501781] [<c0156890>] ? autoremove_wake_function+0x0/0x50
Oct 27 14:20:19 wallace kernel: [13668207.501781] [<c01c3bb7>] ? __mem_cgroup_uncharge_common+0x137/0x170
Oct 27 14:20:19 wallace kernel: [13668207.501781] [<c01a39b8>] ? __dec_zone_page_state+0x18/0x20
Oct 27 14:20:19 wallace kernel: [13668207.501781] [<c01b0411>] ? page_remove_rmap+0x61/0x130
Oct 27 14:20:19 wallace kernel: [13668207.501781] [<c012eb00>] ? kunmap_atomic+0x50/0x60
Oct 27 14:20:19 wallace kernel: [13668207.501781] [<c043599c>] sys_recvfrom+0x7c/0xd0
Oct 27 14:20:19 wallace kernel: [13668207.501781] [<c019c305>] ? __pagevec_free+0x25/0x30
Oct 27 14:20:19 wallace kernel: [13668207.501781] [<c019eee0>] ? release_pages+0x160/0x1a0
Oct 27 14:20:19 wallace kernel: [13668207.501781] [<c02d2bfd>] ? rb_erase+0xcd/0x150
Oct 27 14:20:19 wallace kernel: [13668207.501781] [<c0435a26>] sys_recv+0x36/0x40
Oct 27 14:20:19 wallace kernel: [13668207.501781] [<c0435be7>] sys_socketcall+0x1b7/0x2b0
Oct 27 14:20:19 wallace kernel: [13668207.501781] [<c014617b>] ? sys_gettimeofday+0x2b/0x70
Oct 27 14:20:19 wallace kernel: [13668207.501781] [<c0109eef>] sysenter_do_call+0x12/0x2f

BTW, if you do get kernel task trace information and you decide to
redact it, it'd be good to include at least all your `postgres'
instances, all `httpd' instances, and the summary information at the
end.

--
Craig Ringer

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Hemanta Kumar Biswal 2009-10-27 12:27:40 BUG #5138: Failed to get System metrics for terminal services:87
Previous Message Karen Pease 2009-10-27 05:50:29 Re: Postmaster hangs