pgsql: Install a "dead man switch" to allow the postmaster to detect

From: tgl(at)postgresql(dot)org (Tom Lane)
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: Install a "dead man switch" to allow the postmaster to detect
Date: 2009-05-05 19:59:00
Message-ID: 20090505195900.B91267540FC@cvs.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Log Message:
-----------
Install a "dead man switch" to allow the postmaster to detect cases where
a backend has done exit(0) or exit(1) without having disengaged itself
from shared memory. We are at risk for this whenever third-party code is
loaded into a backend, since such code might not know it's supposed to go
through proc_exit() instead. Also, it is reported that under Windows
there are ways to externally kill a process that cause the status code
returned to the postmaster to be indistinguishable from a voluntary exit
(thank you, Microsoft). If this does happen then the system is probably
hosed --- for instance, the dead session might still be holding locks.
So the best recovery method is to treat this like a backend crash.

The dead man switch is armed for a particular child process when it
acquires a regular PGPROC, and disarmed when the PGPROC is released;
these should be the first and last touches of shared memory resources
in a backend, or close enough anyway. This choice means there is no
coverage for auxiliary processes, but I doubt we need that, since they
shouldn't be executing any user-provided code anyway.

This patch also improves the management of the EXEC_BACKEND
ShmemBackendArray array a bit, by reducing search costs.

Although this problem is of long standing, the lack of field complaints
seems to mean it's not critical enough to risk back-patching; at least
not till we get some more testing of this mechanism.

Modified Files:
--------------
pgsql/src/backend/postmaster:
postmaster.c (r1.580 -> r1.581)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/postmaster/postmaster.c?r1=1.580&r2=1.581)
pgsql/src/backend/storage/ipc:
ipci.c (r1.99 -> r1.100)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/storage/ipc/ipci.c?r1=1.99&r2=1.100)
pmsignal.c (r1.26 -> r1.27)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/storage/ipc/pmsignal.c?r1=1.26&r2=1.27)
pgsql/src/backend/storage/lmgr:
proc.c (r1.205 -> r1.206)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/storage/lmgr/proc.c?r1=1.205&r2=1.206)
pgsql/src/backend/utils/init:
globals.c (r1.107 -> r1.108)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/init/globals.c?r1=1.107&r2=1.108)
pgsql/src/include:
miscadmin.h (r1.209 -> r1.210)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/miscadmin.h?r1=1.209&r2=1.210)
pgsql/src/include/postmaster:
postmaster.h (r1.19 -> r1.20)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/postmaster/postmaster.h?r1=1.19&r2=1.20)
pgsql/src/include/storage:
pmsignal.h (r1.23 -> r1.24)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/storage/pmsignal.h?r1=1.23&r2=1.24)

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2009-05-05 20:06:07 pgsql: Install an atexit(2) callback that ensures that proc_exit's
Previous Message Tom Lane 2009-05-05 19:36:32 pgsql: Insert CHECK_FOR_INTERRUPTS() calls into btree and hash index