Re: kill -KILL: What happens?

From: David Fetter <david(at)fetter(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: kill -KILL: What happens?
Date: 2011-01-13 17:12:35
Message-ID: 20110113171235.GA28078@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 13, 2011 at 10:41:28AM -0500, Tom Lane wrote:
> David Fetter <david(at)fetter(dot)org> writes:
> > I've noticed over the years that we give people dire warnings never to
> > send a KILL signal to the postmaster, but I'm unsure as to what are
> > potential consequences of this, as in just exactly how this can result
> > in problems. Is there some reference I can look to for explanations
> > of the mechanism(s) whereby the damage occurs?
>
> There's no risk of data corruption, if that's what you're thinking of.
> It's just that you're then looking at having to manually clean up the
> child processes and then restart the postmaster; a process that is not
> only tedious but does offer the possibility of screwing yourself.

Does this mean that there's no cross-platform way to ensure that
killing a process results in its children's timely (i.e. before damage
can occur) death? That such a way isn't practical from a performance
point of view?

> In particular the risk is that someone clueless enough to do this would
> next decide that removing $PGDATA/postmaster.pid, rather than killing
> all the existing children, is the quickest way to get the postmaster
> restarted. Once he's done that, his data will shortly be hosed beyond
> recovery, because now he has two noncommunicating sets of backends
> massaging the same files via separate sets of shared buffers.

Right.

> The reason this sequence of events doesn't seem improbable is that the
> error you get when you try to start a new postmaster, if there are still
> old backends running, is
>
> FATAL: pre-existing shared memory block (key 5490001, ID 15609) is still in use
> HINT: If you're sure there are no old server processes still running, remove the shared memory block or just delete the file "postmaster.pid".
>
> Maybe we should rewrite that HINT --- while it's *possible* that
> removing the shmem block or deleting postmaster.pid is the right thing
> to do, it's not exactly *likely*. I think we need to put a bit more
> emphasis on the "If ..." part. Like "If you are prepared to swear on
> your mother's grave that there are no old server processes still
> running, consider removing postmaster.pid. But first check for existing
> processes again."

Maybe the hint could give an OS-tailored way to check this...

> (BTW, I notice that this interlock against starting a new postmaster
> appears to be broken in HEAD, which is likely not unrelated to the
> fact that the contents of postmaster.pid seem to be totally bollixed
> :-()

D'oh! Well, I hope knowing it's a problem gives some kind of glimmer
as to how to solve it :)

Is this worth writing tests for?

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2011-01-13 17:32:47 Re: C++ keywords in headers (was Re: [GENERAL] #include <funcapi.h>)
Previous Message Bruce Momjian 2011-01-13 17:09:09 Re: libpq documentation cleanups (repost 3)