Re: recovery is stuck when children are not processing SIGQUIT from previous crash

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: recovery is stuck when children are not processing SIGQUIT from previous crash
Date: 2009-09-25 15:46:25
Message-ID: 1253893585.26523.15.camel@fsopti579.F-Secure.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers

On Wed, 2009-09-23 at 10:04 -0400, Tom Lane wrote:
> I'd prefer not to go there, at least not without a demonstration that
> this will solve a bug that's unsolvable otherwise. If a child is
> really stuck in a state that doesn't accept SIGQUIT, it probably
> won't accept SIGKILL either (eg, uninterruptable disk wait). Or maybe
> we just have some errant code that is blocking SIGQUIT; but that's
> a garden variety bug IMO, not something that needs major new postmaster
> logic to work around.

strace on the backend processes all showed them waiting at

futex(0x7f1ee5e21c90, FUTEX_WAIT_PRIVATE, 2, NULL

Notably, the first argument was the same for all of them.

I gather that a futex is a Linux kernel thing, which is probably then
used by glibc to implement some pthreads stuff. Anyone know more?

But yes, using SIGKILL on these processes works without problem.

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Jakub Gołębiewski 2009-09-25 16:01:14 postgresql ldap integration
Previous Message Mihail Nasedkin 2009-09-25 02:58:55 Re: pg_toast record in table pg_class

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Page 2009-09-25 15:47:46 Re: Docs build error in alpha1
Previous Message Peter Eisentraut 2009-09-25 15:28:24 Re: Docs build error in alpha1