Unkillable Backend Processes

From: "Thomas F(dot) O'Connell" <tfo(at)sitening(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: Unkillable Backend Processes
Date: 2006-05-23 00:53:00
Message-ID: 3BB218ED-FD62-42BF-8BDD-066954F961AC@sitening.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

I've encountered an oddity on a postgres cluster that results in an
unresponsive postmaster and, frequently, unkillable backend
processes. I'm having a difficult time isolating the queries that are
related to this scenario because by the time the scenario occurs,
max_connections have been reached, and no superuser connections are
available. Because the query doesn't finish, I don't think it's
getting logged (since logging is only done at the query level on a
duration or error basis). In the current iteration, I can tell that
it's an INSERT that's causing the problem, and the INSERT is coming
from an Apache process on a machine on the same network. In recent
occurrences, though, I'm almost positive I've seen a SELECT.

But as troubled as I am by the cause, I'm similarly troubled by my
inability to treat the symptoms effectively. When this occurs, I have
tried shutting down the pgpools and postmaster (using pg_ctl).
Unfortunately, pgpool frequently hangs during the shutdown attempt.
When I kill these off individually using kill and then shut down the
postmaster with pg_ctl immediate mode, I will occasionally find a
backend process that cannot be killed, even with a KILL (-9) signal.

Is this likely to be caused by something at a lower level than postgres?

Here are the specs:

PostgreSQL 8.1.3
pgpool 3.0.1
Debian GNU/Linux 3.1
Linux 2.6.10 #8 SMP
system: ext3 RAID 1
WAL: jfs RAID 10
data: jfs RAID 10

There's also an NFS mount point.

I'm still trying to do the forensics on the root cause (a related
oddity: the system can run in production for days or weeks without
any issues), but I'm just as interested in why I can't kill postgres
backend processes that have no postmaster. If I can provide more
information related to recovery, please let me know.

--
Thomas F. O'Connell
Database Architecture and Programming
Sitening, LLC

http://www.sitening.com/
3004 B Poston Avenue
Nashville, TN 37203-1314
615-260-0005 (cell)
615-469-5150 (office)
615-469-5151 (fax)

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Alvaro Herrera 2006-05-23 01:16:10 Re: Unkillable Backend Processes
Previous Message Chris Browne 2006-05-22 23:02:00 Re: Synchronize Backup to another remote database