Re: postmaster.pid

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joerg Hessdoerfer <Joerg(dot)Hessdoerfer(at)sea-gmbh(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers-win32(at)postgresql(dot)org
Subject: Re: postmaster.pid
Date: 2004-08-26 14:25:28
Message-ID: 17345.1093530328@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers-win32

Joerg Hessdoerfer <Joerg(dot)Hessdoerfer(at)sea-gmbh(dot)com> writes:
> On startup, postmaster reads postmaster.pid, if present, and tries to connect
> to the mentioned port. If the connection fails, no postmaster is present,

... or the kernel is filtering the port, or we couldn't resolve "localhost"
(cf various reports of stats collector not working), or the postmaster
is present but overloaded enough to be missing connection attempts, or ...

> Only if this is received in a reasonable time, we
> are sure to have a postmaster running and should abort startup, else we can
> safely continue.

The real point here is that the behavior has to be to default to
failure, not success. The worst case if we fail incorrectly is that a
small amount of manual intervention is needed to start the postmaster,
ie, remove the lockfile and try again. The worst (and very probable)
case if we succeed incorrectly is extensive, unrecoverable data
corruption. We must *never* have multiple postmasters running against
the same data directory. So taking an attitude of "prove that there is
a working postmaster out there" is quite backwards. You have to think
in terms of "prove that there isn't".

(For the same reason, I am highly suspicious of the quick-fix proposals
we occasionally see to add an "rm $PGDATA/postmaster.pid" to pg_ctl or
the init script. That is nothing but a large-caliber pistol loaded,
cocked, and aimed at your foot.)

I've occasionally thought about abandoning the PID test, in favor of
relying completely on the shmem-existence test. If the shmem segment
named in the lockfile doesn't exist or has zero processes connected to
it, we could safely assume that the original postmaster is gone.
(If it has processes connected, we must abort anyway, to cover the case
where the postmaster crashed but backends remain alive.) The risk here
is that we are then *completely* at the mercy of the OS having a correct
emulation of the SysV shmem semantics, in particular the ability to
detect whether a shmem segment has other processes connected to it.
I'm not sure whether this is true on all the supported platforms.
(This being the win32 list: what about Windows?)

regards, tom lane

In response to

Responses

Browse pgsql-hackers-win32 by date

  From Date Subject
Next Message Dave Page 2004-08-26 14:38:55 Re: Service startup delay
Previous Message Dave Page 2004-08-26 14:17:25 Re: postmaster.pid