Re: [HACKERS] Function to kill backend

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Dave Page" <dpage(at)vale-housing(dot)co(dot)uk>
Cc: "Magnus Hagander" <mha(at)sollentuna(dot)net>, "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Josh Berkus" <josh(at)agliodbs(dot)com>, "PostgreSQL-patches" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Function to kill backend
Date: 2004-07-26 03:46:28
Message-ID: 18425.1090813588@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

"Dave Page" <dpage(at)vale-housing(dot)co(dot)uk> writes:
> I don't know the details of how it works, but is it any worse/better
> than 'kill -9' (which iirc is no longer considered an absolute no-no)?

What I've been trying to remind people of is that killing just a single
backend with SIGTERM is not the normal code path and can't be considered
well-tested. We know it works to shut down an entire cluster with
simultaneous SIGTERMs. However, in that situation the only correctness
requirement is that the final database state on disk be consistent.
We don't really *know* what state is being left behind in the shared
memory segment, because shmem just gets thrown away. It could be that
sometimes some locks don't get released, or in other ways a SIGTERM'd
backend fails to clean up after itself fully.

In comparison, the query-cancel code path is nearly indistinguishable
from any ordinary elog(ERROR). We can also have confidence that kill -9
on an individual backend is not going to screw things terribly, because
that simulates a backend crash, and the recovery path for that has been
(ahem) tested pretty frequently over the years. Note also that in the
kill -9 case, again only the final database state on disk matters, not
the condition of shared memory.

Another way to look at this is that elog(FATAL) in general is not a well
tested code path, because it just hardly ever happens in the field.
The only elog(FATAL)s that get exercised with any regularity are the
ones that reject a connection request during authentication, and those
all occur *before* the backend has become a full-fledged backend and
acquired any resources it might need to release. The only elog(FATAL)
calls in an up-and-running backend are for "can't happen" conditions,
and by and large indeed those don't happen.

So what it comes down to is that we can put this feature out there
if we choose, but we'd be fooling ourselves to think we can consider it
reliable. Moreover, since the kinds of cases where you'd use a session
kill don't arise every day, I don't think we could say we'd acquire any
confidence in it over time either. It'd always remain a little-used
corner of the code, and little-used corners tend to gather bit rot.

If you don't mind plastering a "use at your own risk" sign on it, then
go for it.

regards, tom lane

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2004-07-26 03:52:38 Re: [HACKERS] Function to kill backend
Previous Message Alvaro Herrera 2004-07-26 03:22:13 Re: [subxacts] Savepoint syntax