Re: random failing builds on spoonbill - backends not exiting...

From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: random failing builds on spoonbill - backends not exiting...
Date: 2012-06-23 06:40:18
Message-ID: 4FE564D2.6000609@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 06/22/2012 11:47 PM, Tom Lane wrote:
> Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
>>>> PID PENDING CAUGHT IGNORED BLOCKED COMMAND
>>>> 12480 20004004 34084005 c942b002 fffefeff postgres: writer process
>>>> 9841 20004004 34084007 c942b000 fffefeff postgres: wal writer process
>
>> this seems to be SIGUSR1,SIGTERM and SIGQUIT
>
> OK, I looked up OpenBSD's signal numbers on the web. It looks to me
> like these two processes have everything blocked except KILL and STOP
> (which are unblockable of course). I do not see any place in the PG
> code that could possibly set such a mask (note that BlockSig should
> have more holes in it than that). So I'm thinking these must be
> blocked inside some system function that's installed a restrictive
> signal mask, or some such function forgot to restore the mask on exit.
> Could you gdb each of these processes and get a stack trace?

background writer (12480):

(gdb) bt
#0 0x0000000208eb5928 in poll () from /usr/lib/libc.so.62.0
#1 0x000000020a972b88 in _thread_kern_poll (wait_reqd=Variable
"wait_reqd" is not available.
) at /usr/src/lib/libpthread/uthread/uthread_kern.c:784
#2 0x000000020a973d04 in _thread_kern_sched (scp=0x0) at
/usr/src/lib/libpthread/uthread/uthread_kern.c:384
#3 0x000000020a96b35c in poll (fds=0xfffffffffffefa80, nfds=Variable
"nfds" is not available.
) at /usr/src/lib/libpthread/uthread/uthread_poll.c:94
#4 0x0000000000395538 in WaitLatchOrSocket (latch=0x212bdc97c,
wakeEvents=25, sock=-1, timeout=10000) at pg_latch.c:286
#5 0x0000000000399800 in BackgroundWriterMain () at bgwriter.c:325
#6 0x0000000000201850 in AuxiliaryProcessMain (argc=2,
argv=0xfffffffffffefd98) at bootstrap.c:419
#7 0x00000000003a1534 in StartChildProcess (type=BgWriterProcess) at
postmaster.c:4518
#8 0x00000000003a7574 in reaper (postgres_signal_arg=Variable
"postgres_signal_arg" is not available.
) at postmaster.c:2385
#9 0x000000020a974528 in _dispatch_signal (sig=20,
scp=0xffffffffffff03e0) at /usr/src/lib/libpthread/uthread/uthread_sig.c:408
#10 0x000000020a97462c in _dispatch_signals (scp=0xffffffffffff03e0) at
/usr/src/lib/libpthread/uthread/uthread_sig.c:437
#11 0x000000020a974e28 in _thread_sig_handler (sig=20,
info=0xffffffffffff0420, scp=0xffffffffffff03e0) at
/usr/src/lib/libpthread/uthread/uthread_sig.c:139
#12 <signal handler called>
#13 _thread_kern_set_timeout (timeout=0xffffffffffff0630) at
/usr/src/lib/libpthread/uthread/uthread_kern.c:989
#14 0x000000020a96bc8c in select (numfds=9, readfds=0xffffffffffff0730,
writefds=0x0, exceptfds=0x0, timeout=Variable "timeout" is not available.
) at /usr/src/lib/libpthread/uthread/uthread_select.c:85
#15 0x00000000003a2894 in ServerLoop () at postmaster.c:1321
#16 0x00000000003a45ac in PostmasterMain (argc=Variable "argc" is not
available.
) at postmaster.c:1121
#17 0x0000000000326df8 in main (argc=6, argv=0xffffffffffff14f8) at
main.c:199

wal writer(9841):

#0 0x0000000208eb5928 in poll () from /usr/lib/libc.so.62.0
#1 0x000000020a972b88 in _thread_kern_poll (wait_reqd=Variable
"wait_reqd" is not available.
) at /usr/src/lib/libpthread/uthread/uthread_kern.c:784
#2 0x000000020a973d04 in _thread_kern_sched (scp=0x0) at
/usr/src/lib/libpthread/uthread/uthread_kern.c:384
#3 0x000000020a96b35c in poll (fds=0xfffffffffffefa80, nfds=Variable
"nfds" is not available.
) at /usr/src/lib/libpthread/uthread/uthread_poll.c:94
#4 0x0000000000395538 in WaitLatchOrSocket (latch=0x212bdc69c,
wakeEvents=25, sock=-1, timeout=5000) at pg_latch.c:286
#5 0x00000000003aa794 in WalWriterMain () at walwriter.c:301
#6 0x0000000000201878 in AuxiliaryProcessMain (argc=2,
argv=0xfffffffffffefd98) at bootstrap.c:430
#7 0x00000000003a1534 in StartChildProcess (type=WalWriterProcess) at
postmaster.c:4518
#8 0x00000000003a7564 in reaper (postgres_signal_arg=Variable
"postgres_signal_arg" is not available.
) at postmaster.c:2387
#9 0x000000020a974528 in _dispatch_signal (sig=20,
scp=0xffffffffffff03e0) at /usr/src/lib/libpthread/uthread/uthread_sig.c:408
#10 0x000000020a97462c in _dispatch_signals (scp=0xffffffffffff03e0) at
/usr/src/lib/libpthread/uthread/uthread_sig.c:437
#11 0x000000020a974e28 in _thread_sig_handler (sig=20,
info=0xffffffffffff0420, scp=0xffffffffffff03e0) at
/usr/src/lib/libpthread/uthread/uthread_sig.c:139
#12 <signal handler called>
#13 _thread_kern_set_timeout (timeout=0xffffffffffff0630) at
/usr/src/lib/libpthread/uthread/uthread_kern.c:989
#14 0x000000020a96bc8c in select (numfds=9, readfds=0xffffffffffff0730,
writefds=0x0, exceptfds=0x0, timeout=Variable "timeout" is not available.
) at /usr/src/lib/libpthread/uthread/uthread_select.c:85
#15 0x00000000003a2894 in ServerLoop () at postmaster.c:1321
#16 0x00000000003a45ac in PostmasterMain (argc=Variable "argc" is not
available.
) at postmaster.c:1121
#17 0x0000000000326df8 in main (argc=6, argv=0xffffffffffff14f8) at
main.c:199

Stefan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2012-06-23 06:53:11 Re: Allow WAL information to recover corrupted pg_controldata
Previous Message Stefan Kaltenbrunner 2012-06-23 06:36:29 Re: random failing builds on spoonbill - backends not exiting...