Skip site navigation (1) Skip section navigation (2)

Re: Attempt to stop dead instance can stop a randomprocess?

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Attempt to stop dead instance can stop a randomprocess?
Date: 2007-08-31 19:41:47
Message-ID: 46D828AB.EE98.0025.0@wicourts.gov (view raw or flat)
Thread:
Lists: pgsql-hackers
>>> On Fri, Aug 31, 2007 at  2:18 PM, in message <381(dot)1188587883(at)sss(dot)pgh(dot)pa(dot)us>,
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote: 
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
>> It appears that when pg_ctl gets a stop request for a given directory, it l=
>> ooks for a pid file in that directory and signals that pid to stop.  It doe=
>> sn't appear to check that the pid is for a PostgreSQL postmaster running ou=
>> t of the given directory.  I think it should, although on a quick scan of t=
>> he code, I didn't see a convenient way to do that.
> 
> [ shrug... ]  AFAICS there is no way to know that.
 
I sure couldn't see a way, but I was hoping that was just a matter of my own
ignorance.
 
>> I have some evidence that when we attempted to stop a PostgreSQL instance w=
>> hich (it turned out) had died without cleaning up the pid file, it actually=
>>  stopped another instance which was using a different data directory but ha=
>> d wrapped around to the same pid.
> 
> The real question there is how come the postmaster died without removing
> the pidfile.  It's not that easy to crash the postmaster ...
 
Well, that's not due to a bug in PostgreSQL.  We're using a buggy LDAP
implementation (not my call) which can crash things.  The machine totally
locked up after logging distress messages from that daemon, and they cycled
power to get out of it.
 
The PostgreSQL issue here was a secondary problem in trying to get the
server back to normal.  So really, what I was suggesting was something to
improve the robustness of PostgreSQL in the face of severe challenges posed
by other issues.  I realize it's a very low volume issue; if it's not easy
to fix, probably not worth it.
 
Now to bug the people on the list of authorized contacts for Novell to open
a support case on the LDAP problems, and see how many of the 40 core dumps
I have from their daemon they want to see.
 
-Kevin
 


In response to

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2007-08-31 20:10:13
Subject: Re: Attempt to stop dead instance can stop a random process?
Previous:From: Florian G. PflugDate: 2007-08-31 19:40:48
Subject: [PATCH] Lazy xid assingment V2

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group