RE: How to shoot yourself in the foot: kill -9 postmaster

From: "Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Lamar Owen" <lamar(dot)owen(at)wgcr(dot)org>, <pgsql-hackers(at)postgresql(dot)org>, "Alfred Perlstein" <bright(at)wintelcom(dot)net>
Subject: RE: How to shoot yourself in the foot: kill -9 postmaster
Date: 2001-03-07 04:38:36
Message-ID: EKEJJICOHDIEMGPNIFIJIEMBDMAA.Inoue@tpf.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> -----Original Message-----
> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
>
> The interlock has to be tightly tied to the PGDATA directory, because
> what we're trying to protect is the files in and under that directory.
> It seems that something based on file(s) in that directory is the way
> to go.
>
> The best idea I've seen so far is Hiroshi's idea of having all the
> backends hold fcntl locks on the same file (probably postmaster.pid
> would do fine). Then the new postmaster can test whether any backends
> are still alive by trying to lock the old postmaster.pid file.
> Unfortunately, I read in the fcntl man page:
>
> Locks are not inherited by a child process in a fork(2) system call.
>

Yes flock() works well here but fcntl() doesn't.

> This makes the idea much less attractive than I originally thought:
> a new backend would not automatically inherit a lock on the
> postmaster.pid file from the postmaster, but would have to open/lock it
> for itself. That means there's a window where the new backend exists
> but would be invisible to a hypothetical new postmaster.
>
> We could work around this with the following, very ugly protocol:
>
> 1. Postmaster normally maintains fcntl read lock on its postmaster.pid
> file. Each spawned backend immediately opens and read-locks
> postmaster.pid, too, and holds that file open until it dies. (Thus
> wasting a kernel FD per backend, which is one of the less attractive
> things about this.) If the backend is unable to obtain read lock on
> postmaster.pid, then it complains and dies. We must use read locks
> here so that all these processes can hold them separately.
>
> 2. If a newly started postmaster sees a pre-existing postmaster.pid
> file, it tries to obtain a *write* lock on that file. If it fails,
> conclude that an old postmaster or backend is still alive; complain
> and quit. If it succeeds, sit for say 1 second before deleting the file
> and creating a new one. (The delay here is to allow any just-started
> old backends to fail to acquire read lock and quit. A possible
> objection is that we have no way to guarantee 1 second is enough, though
> it ought to be plenty if the lock acquisition is just after the fork.)
>

I have another idea. My main point is to not remove the existent
pidfile. For example
1) A newly started postmaster tries to obtain a write lock on the
first byte of the pidfile. If it fails the postmaster quit.
2) The postmaster tries to obtain a write lock on the second byte
of the pidfile. If it fails the postmaster quit.
3) The postmaster releases the lock of 2).
4) Each backend obtains a read-lock on the second byte of the
pidfile.

Regards,
Hiroshi Inoue

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hiroshi Inoue 2001-03-07 04:38:42 RE: Proposed WAL changes
Previous Message Tatsuo Ishii 2001-03-07 03:33:07 Re: Re[2]: Re: [HACKERS] why the DB file size does not reduce when 'delete'the data in DB?