From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
Cc: | pgsql-admin(at)postgresql(dot)org |
Subject: | Re: recovery is stuck when children are not processing SIGQUIT from previous crash |
Date: | 2009-09-23 14:04:21 |
Message-ID: | 21890.1253714661@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin pgsql-hackers |
Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> I have observed the following situation a few times now (weeks or months
> apart), most recently with 8.3.7. Some postgres child process crashes.
> The postmaster notices and sends SIGQUIT to all other children. Once
> all other children have exited, it would enter recovery. But for some
> reason, some children are not processing the SIGQUIT signal and are
> basically just stuck. That means the whole database system is then
> stuck and won't continue without manual intervention. If I go in
> manually and SIGKILL the offending processes, everything proceeds
> normally, recovery finishes, and the system is up again.
We need some investigation into why that is happening.
> I haven't had the chance yet to analyze why the SIGQUIT signals are
> getting stuck. Be that as it may, it appears there are no provisions
> for this case. I couldn't find any documentation or previous reports on
> this sort of thing. One might imagine a feature where the postmaster
> resorts to throwing SIGKILLs around after a while, similar to how init
> scripts are sometimes set up.
I'd prefer not to go there, at least not without a demonstration that
this will solve a bug that's unsolvable otherwise. If a child is
really stuck in a state that doesn't accept SIGQUIT, it probably
won't accept SIGKILL either (eg, uninterruptable disk wait). Or maybe
we just have some errant code that is blocking SIGQUIT; but that's
a garden variety bug IMO, not something that needs major new postmaster
logic to work around.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Isabella Ghiurea | 2009-09-23 15:47:10 | Re: db size and tables size difference |
Previous Message | Rafael Domiciano | 2009-09-23 13:07:47 | Authentication Postgres user via LDAP |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2009-09-23 14:20:22 | Getting the red out (of the buildfarm) |
Previous Message | Petr Jelinek | 2009-09-23 12:40:48 | Re: [PATCH] DefaultACLs |