Re: Attempt to stop dead instance can stop a random process?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Attempt to stop dead instance can stop a random process?
Date: 2007-08-31 20:10:13
Message-ID: 1068.1188591013@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> Well, that's not due to a bug in PostgreSQL. We're using a buggy LDAP
> implementation (not my call) which can crash things. The machine totally
> locked up after logging distress messages from that daemon, and they cycled
> power to get out of it.

Hmm. Do I correctly grasp the picture that you've got several Postgres
installations on the machine and they're all booted by startup scripts?

In this situation, it's actually not a bad idea to run each one under a
separate userid. The problem is that in successive reboots, each
postmaster will typically get almost but not exactly the same PID as
last time, since the number of processes launched earlier in system
startup is mostly but not completely deterministic. If you start all
the postmasters together, as you probably do, then there will be
occasions when one gets a PID that another one had in the previous boot
cycle. That can lead to refusal to start up: if a postmaster sees a
postmaster lock file in its data directory, containing a PID that
belongs to another live process owned by the same userid, it has to
assume that that's a conflicting postmaster and it must respect the lock
file. You can prevent that problem if each postmaster (data directory)
belongs to a different userid.

(Some people prefer to fix this by having a startup script that forcibly
removes all the lockfiles before launching the postmasters. I think
that's kinda risky, although if it's done in a separate script that
you'd have no reason to run by hand, it's probably OK. Clueless folks
put the action right in the postgresql start script, meaning that a
thoughtless "service postgresql start" blows away the lock file...)

BTW, I would imagine that some scenario like this preceded the problem
that you actually reported, since had all the postmasters started
successfully, they'd all have written correct lockfiles.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2007-08-31 20:30:31 Re: Attempt to stop dead instance can stop a random process?
Previous Message Kevin Grittner 2007-08-31 19:41:47 Re: Attempt to stop dead instance can stop a random process?