Re: pid gets overwritten in OSX

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Francois Suter <dba(at)paragraf(dot)ch>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: pid gets overwritten in OSX
Date: 2002-04-29 14:28:41
Message-ID: 23554.1020090521@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Francois Suter <dba(at)paragraf(dot)ch> writes:
> The error happened again during the week-end and I was able to=20
> collect the following from Postgres' logfile:

> Lock file "/usr/local/pgsql/data/postmaster.pid" already exists.
> Is another postmaster (pid 217) running in "/usr/local/pgsql/data"?

> So it seems that the problem is that the postmaster.pid file can't be=20
> overwritten. I checked the last mod date and it is indeed left over=20
> from last startup. Any idea what could be causing this problem?

Well, it *could* be overwritten, but Postgres won't do it if it sees
that there is a process of that PID in the system.

What I think is happening is that there's some small variation in the
number or ordering of processes launched during system boot. Maybe one
time Postgres is PID 217, the next time it is PID 218 and some other
daemon happens to get 217. But if 217 is what is in the lockfile, and
we see *any* other existent process with PID 217, we cravenly refuse
to overwrite the lockfile.

I have seen this sort of thing before with other daemons --- on my
system, sendmail occasionally refuses to start after a power failure &
reboot because it has the same sort of lockfile checking behavior.

We could perhaps avoid this scenario by being a little tighter about
what we will believe is a conflicting process --- for example, if PID
217 exists but isn't our same userID, don't assume it's the old
postmaster still running. But I could easily see that cure being worse
than the disease. If it ever let us start two conflicting postmasters
in the same data directory, data corruption would be the certain result.
That's exactly what the lockfile is there to prevent.

The real problem is that the old postmaster was evidently not allowed
to shut down cleanly (else it'd have removed its lockfile). How are
you powering down the system, anyway?

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message ARP 2002-04-29 14:32:39 Re: What popular, large commercial websites run PostgreSQL?
Previous Message Bruce Momjian 2002-04-29 14:11:05 Re: Desc of Functions

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2002-04-29 14:39:34 Re: MetaData (size of datatype)
Previous Message Phil Dodderidge 2002-04-29 14:09:07 GSSAPI/Kerberos