Re: recovery is stuck when children are not processing SIGQUIT from previous crash

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: recovery is stuck when children are not processing SIGQUIT from previous crash
Date: 2009-11-12 12:02:21
Message-ID: 1258027341.26305.18.camel@fsopti579.F-Secure.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers

On lör, 2009-09-26 at 12:19 -0400, Tom Lane wrote:
> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> > strace on the backend processes all showed them waiting at
> > futex(0x7f1ee5e21c90, FUTEX_WAIT_PRIVATE, 2, NULL
> > Notably, the first argument was the same for all of them.
>
> Probably means they are blocked on semaphores. Stack traces would
> be much more informative ...

Got one now:

#0 0x00007f65951eaf8e in ?? () from /lib/libc.so.6
#1 0x00007f65951dc218 in ?? () from /lib/libc.so.6
#2 0x00007f65951dbcdd in __vsyslog_chk () from /lib/libc.so.6
#3 0x00007f65951dc1a0 in syslog () from /lib/libc.so.6
#4 0x00000000006694bd in EmitErrorReport () at elog.c:1404
#5 0x0000000000669935 in errfinish (dummy=-1790575472) at elog.c:415
#6 0x00000000005c291e in quickdie (postgres_signal_arg=<value optimized
out>) at postgres.c:2502
#7 <signal handler called>
#8 0x00007f65951e0513 in send () from /lib/libc.so.6
#9 0x00007f65951dbeed in __vsyslog_chk () from /lib/libc.so.6
#10 0x00007f65951dc1a0 in syslog () from /lib/libc.so.6
#11 0x00000000006694bd in EmitErrorReport () at elog.c:1404
#12 0x0000000000669935 in errfinish (dummy=3) at elog.c:415
#13 0x00000000005c291e in quickdie (postgres_signal_arg=<value optimized
out>) at postgres.c:2502
#14 <signal handler called>
#15 0x00007f65951e0303 in recv () from /lib/libc.so.6
#16 0x00000000005486a8 in secure_read (port=0x24a76f0, ptr=0x9ac680,
len=8192) at be-secure.c:319
#17 0x000000000054f3d0 in pq_recvbuf () at pqcomm.c:754
#18 0x000000000054f817 in pq_getbyte () at pqcomm.c:795
#19 0x00000000005c4d10 in PostgresMain (argc=4, argv=<value optimized
out>, username=0x2478728 "xyz") at postgres.c:317
#20 0x000000000059938d in ServerLoop () at postmaster.c:3218
#21 0x000000000059a0cf in PostmasterMain (argc=5, argv=0x24731d0) at
postmaster.c:1031
#22 0x0000000000551245 in main (argc=5, argv=<value optimized out>) at
main.c:188

Looks like a race condition or lockup in the syslog code.

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Marko Kreen 2009-11-12 12:19:51 Re: recovery is stuck when children are not processing SIGQUIT from previous crash
Previous Message Alvaro Herrera 2009-11-10 18:57:31 Re: postgres 8.4 autovacuum and XID wraparound

Browse pgsql-hackers by date

  From Date Subject
Next Message Marko Kreen 2009-11-12 12:19:51 Re: recovery is stuck when children are not processing SIGQUIT from previous crash
Previous Message Robert Haas 2009-11-12 11:55:12 Re: next CommitFest