Quick Links

Re: recovery is stuck when children are not processing SIGQUIT from previous crash

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc:	pgsql-admin(at)postgresql(dot)org
Subject:	Re: recovery is stuck when children are not processing SIGQUIT from previous crash
Date:	2009-09-23 14:04:21
Message-ID:	21890.1253714661@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-admin pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> I have observed the following situation a few times now (weeks or months
> apart), most recently with 8.3.7. Some postgres child process crashes.
> The postmaster notices and sends SIGQUIT to all other children. Once
> all other children have exited, it would enter recovery. But for some
> reason, some children are not processing the SIGQUIT signal and are
> basically just stuck. That means the whole database system is then
> stuck and won't continue without manual intervention. If I go in
> manually and SIGKILL the offending processes, everything proceeds
> normally, recovery finishes, and the system is up again.

We need some investigation into why that is happening.

> I haven't had the chance yet to analyze why the SIGQUIT signals are
> getting stuck. Be that as it may, it appears there are no provisions
> for this case. I couldn't find any documentation or previous reports on
> this sort of thing. One might imagine a feature where the postmaster
> resorts to throwing SIGKILLs around after a while, similar to how init
> scripts are sometimes set up.

I'd prefer not to go there, at least not without a demonstration that
this will solve a bug that's unsolvable otherwise. If a child is
really stuck in a state that doesn't accept SIGQUIT, it probably
won't accept SIGKILL either (eg, uninterruptable disk wait). Or maybe
we just have some errant code that is blocking SIGQUIT; but that's
a garden variety bug IMO, not something that needs major new postmaster
logic to work around.

regards, tom lane

In response to

recovery is stuck when children are not processing SIGQUIT from previous crash at 2009-09-23 11:21:31 from Peter Eisentraut

Responses

Re: recovery is stuck when children are not processing SIGQUIT from previous crash at 2009-09-25 15:46:25 from Peter Eisentraut

Browse pgsql-admin by date

	From	Date	Subject
Next Message	Isabella Ghiurea	2009-09-23 15:47:10	Re: db size and tables size difference
Previous Message	Rafael Domiciano	2009-09-23 13:07:47	Authentication Postgres user via LDAP

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2009-09-23 14:20:22	Getting the red out (of the buildfarm)
Previous Message	Petr Jelinek	2009-09-23 12:40:48	Re: [PATCH] DefaultACLs