Re: [ADMIN] recovery is stuck when children are not processing SIGQUIT from previous crash

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [ADMIN] recovery is stuck when children are not processing SIGQUIT from previous crash
Date: 2009-12-09 13:50:54
Message-ID: 1260366654.8753.2.camel@fsopti579.F-Secure.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers

[moved to -hackers]

On tor, 2009-11-12 at 09:35 -0500, Tom Lane wrote:
> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> >>> strace on the backend processes all showed them waiting at
> >>> futex(0x7f1ee5e21c90, FUTEX_WAIT_PRIVATE, 2, NULL
> >>> Notably, the first argument was the same for all of them.
>
> > Looks like a race condition or lockup in the syslog code.
>
> Hm, why are there two <signal handler> calls in the stack?
> The only thing I can think of is that we sent SIGQUIT twice.
> That's probably bad --- is there any obvious path through
> the postmaster that would do that?
>
> The other thought is that quickdie should block signals before
> starting to do anything.

Right. This would actually already work because a signal is blocked
while its handler runs, except that we start quickdie() with

PG_SETMASK(&BlockSig);

which blocks everything except SIGQUIT. That should probably be fixed
in any case.

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Peter Eisentraut 2009-12-09 13:57:42 Re: [ADMIN] recovery is stuck when children are not processing SIGQUIT from previous crash
Previous Message Guillaume Lelarge 2009-12-09 12:19:42 Re: Cannot increase connection limit?

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2009-12-09 13:56:16 Re: [patch] pg_ctl init extension
Previous Message Zdenek Kotala 2009-12-09 13:32:35 Re: [patch] pg_ctl init extension