Re: How to shoot yourself in the foot: kill -9 postmaster

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alfred Perlstein <bright(at)wintelcom(dot)net>
Cc: Lamar Owen <lamar(dot)owen(at)wgcr(dot)org>, Hiroshi Inoue <Inoue(at)tpf(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: How to shoot yourself in the foot: kill -9 postmaster
Date: 2001-03-06 18:10:47
Message-ID: 6536.983902247@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alfred Perlstein <bright(at)wintelcom(dot)net> writes:
> I'm sure some sort of encoding of the PGDATA directory along with
> the pids stored in the shm segment...

I thought about this too, but it strikes me as not very trustworthy.
The problem is that there's no guarantee that the new postmaster will
even notice the old shmem segment: it might select a different shmem
key. (The 7.1 coding of shmem key selection makes this more likely
than it used to be, but even under 7.0, it will certainly fail to work
if I choose to start the new postmaster using a different port number
than the old one had. The shmem key is driven primarily by port number
not data directory ...)

The interlock has to be tightly tied to the PGDATA directory, because
what we're trying to protect is the files in and under that directory.
It seems that something based on file(s) in that directory is the way
to go.

The best idea I've seen so far is Hiroshi's idea of having all the
backends hold fcntl locks on the same file (probably postmaster.pid
would do fine). Then the new postmaster can test whether any backends
are still alive by trying to lock the old postmaster.pid file.
Unfortunately, I read in the fcntl man page:

Locks are not inherited by a child process in a fork(2) system call.

This makes the idea much less attractive than I originally thought:
a new backend would not automatically inherit a lock on the
postmaster.pid file from the postmaster, but would have to open/lock it
for itself. That means there's a window where the new backend exists
but would be invisible to a hypothetical new postmaster.

We could work around this with the following, very ugly protocol:

1. Postmaster normally maintains fcntl read lock on its postmaster.pid
file. Each spawned backend immediately opens and read-locks
postmaster.pid, too, and holds that file open until it dies. (Thus
wasting a kernel FD per backend, which is one of the less attractive
things about this.) If the backend is unable to obtain read lock on
postmaster.pid, then it complains and dies. We must use read locks
here so that all these processes can hold them separately.

2. If a newly started postmaster sees a pre-existing postmaster.pid
file, it tries to obtain a *write* lock on that file. If it fails,
conclude that an old postmaster or backend is still alive; complain
and quit. If it succeeds, sit for say 1 second before deleting the file
and creating a new one. (The delay here is to allow any just-started
old backends to fail to acquire read lock and quit. A possible
objection is that we have no way to guarantee 1 second is enough, though
it ought to be plenty if the lock acquisition is just after the fork.)

One thing that worries me a little bit is that this means an fcntl
read-lock request will exist inside the kernel for each active backend.
Does anyone know of any performance problems or hard kernel limits we
might run into with large numbers of backends (lots and lots of fcntl
locks)? At least the locks are on a file that we don't actually touch
in the normal course of business.

A small savings is that the backends don't actually need to open new FDs
for the postmaster.pid file; they can use the one they inherit from the
postmaster, even though they do need to lock it again. I'm not sure how
much that saves inside the kernel, but at least something.

There are also the usual set of concerns about portability of flock,
though this time we're locking a plain file and not a socket, so it
shouldn't be as much trouble as it was before.

Comments? Does anyone see a better way to do it?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2001-03-06 18:18:53 Re: AW: AW: AW: AW: WAL-based allocation of XIDs is insecur e
Previous Message The Hermit Hacker 2001-03-06 18:06:29 Re: mailing list messages