Re: Attempt to stop dead instance can stop a random process?

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Attempt to stop dead instance can stop a random process?
Date: 2007-08-31 19:41:47
Message-ID: 46D828AB.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>> On Fri, Aug 31, 2007 at 2:18 PM, in message <381(dot)1188587883(at)sss(dot)pgh(dot)pa(dot)us>,
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
>> It appears that when pg_ctl gets a stop request for a given directory, it l=
>> ooks for a pid file in that directory and signals that pid to stop. It doe=
>> sn't appear to check that the pid is for a PostgreSQL postmaster running ou=
>> t of the given directory. I think it should, although on a quick scan of t=
>> he code, I didn't see a convenient way to do that.
>
> [ shrug... ] AFAICS there is no way to know that.

I sure couldn't see a way, but I was hoping that was just a matter of my own
ignorance.

>> I have some evidence that when we attempted to stop a PostgreSQL instance w=
>> hich (it turned out) had died without cleaning up the pid file, it actually=
>> stopped another instance which was using a different data directory but ha=
>> d wrapped around to the same pid.
>
> The real question there is how come the postmaster died without removing
> the pidfile. It's not that easy to crash the postmaster ...

Well, that's not due to a bug in PostgreSQL. We're using a buggy LDAP
implementation (not my call) which can crash things. The machine totally
locked up after logging distress messages from that daemon, and they cycled
power to get out of it.

The PostgreSQL issue here was a secondary problem in trying to get the
server back to normal. So really, what I was suggesting was something to
improve the robustness of PostgreSQL in the face of severe challenges posed
by other issues. I realize it's a very low volume issue; if it's not easy
to fix, probably not worth it.

Now to bug the people on the list of authorized contacts for Novell to open
a support case on the LDAP problems, and see how many of the 40 core dumps
I have from their daemon they want to see.

-Kevin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-08-31 20:10:13 Re: Attempt to stop dead instance can stop a random process?
Previous Message Florian G. Pflug 2007-08-31 19:40:48 [PATCH] Lazy xid assingment V2