recovery is stuck when children are not processing SIGQUIT from previous crash

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-admin(at)postgresql(dot)org
Subject: recovery is stuck when children are not processing SIGQUIT from previous crash
Date: 2009-09-23 11:21:31
Message-ID: 1253704891.20834.8.camel@fsopti579.F-Secure.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers

I have observed the following situation a few times now (weeks or months
apart), most recently with 8.3.7. Some postgres child process crashes.
The postmaster notices and sends SIGQUIT to all other children. Once
all other children have exited, it would enter recovery. But for some
reason, some children are not processing the SIGQUIT signal and are
basically just stuck. That means the whole database system is then
stuck and won't continue without manual intervention. If I go in
manually and SIGKILL the offending processes, everything proceeds
normally, recovery finishes, and the system is up again.

I haven't had the chance yet to analyze why the SIGQUIT signals are
getting stuck. Be that as it may, it appears there are no provisions
for this case. I couldn't find any documentation or previous reports on
this sort of thing. One might imagine a feature where the postmaster
resorts to throwing SIGKILLs around after a while, similar to how init
scripts are sometimes set up. But perhaps manual intervention is the
way to go.

Comments?

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Rafael Domiciano 2009-09-23 13:07:47 Authentication Postgres user via LDAP
Previous Message nalini 2009-09-23 06:35:23 Re: Recover postgres database

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2009-09-23 11:36:02 Re: Hot Standby 0.2.1
Previous Message Simon Riggs 2009-09-23 10:23:00 Re: Hot Standby 0.2.1