Re: postmaster dead but backends still running?

From: Charles Hornberger <charlie(at)hss(dot)caltech(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: postmaster dead but backends still running?
Date: 2003-06-19 17:22:47
Message-ID: Pine.LNX.4.53.0306191011140.3921@economex.caltech.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Tue, 17 Jun 2003, Tom Lane wrote:
> Charles Hornberger <charlie(at)hss(dot)caltech(dot)edu> writes:
> > Other things I perhaps ought to mention: Trying to stop the postmaster
> > using pg_ctl fails (unsurprisingly, since pg_ctl relies on
> > /var/pgsql/data/postmaster.pid, which contains a nonexistent PID); I
> > haven't tried to start a new postmaster yet, because the old backends
> > are hanging around.
>
> In theory a new postmaster would detect the old backends and refuse to
> start anyway. I don't trust that interlock unreservedly though. (But
> please test it while you have the opportunity...)

Unfortunately, our system administrator solved this before I got a chance
to test more. I don't know how he went about restarting the server,
although whatever he did doesn't appear to have hurt anything; would
it be interesting to know exactly what steps he took?

> > Nor have I attempted to restart the web server, which might allow the
> > hanging-round backends to die by closing the old connections it's
> > holding to them. I'm tempted to go ahead and do this, though I'm not
> > sure whether I ought to until I've diagnosed what's going on right now.
>
> You will need to close all the existing connections before the new
> postmaster can be started. I'd recommend doing so sooner instead of
> later, because with no postmaster you aren't getting any checkpoints
> done, and your WAL space is going to start ballooning.
>
> As far as diagnosing the problem goes: if you have a postmaster log
> file, look to see if the postmaster wrote an ERROR or FATAL message
> before it exited. (Finding it among all the backend-level messages
> might be painful though.) Also look in the directory the postmaster
> was started in to see if there's a core file. Save away any evidence
> you can find before trying to start a new postmaster.

Interestingly, there are no messages in the log file, and I can't find a
core file -- in short, there's no evidence whatsoever, at least not that
I can find. (Though I am probably a pretty rotten detective.)

However, I think I know the cause (though I haven't tested to see if this
indeed causes the postmaster to die): A few hours before I noticed that
the postmaster was dead, one of the sysadmins made a typo that caused an
NFS mount to become unavailable -- the very NFS mount that held the
postgres executable (all our Solaris boxes share the same executables). So
the theory is that the postmaster tried to fork() a process using a
non-existent executable, and died as a result. Does this make any sense?

-Charlie

> Because the postmaster doesn't actually do much, crashes are pretty
> unusual. I'm interested in whatever you can find.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Ragnar Kjørstad 2003-06-19 17:22:50 Re: Database Encryption
Previous Message Radu-Adrian Popescu 2003-06-19 17:20:08 Re: IMPORTANT:migration de mysql =>postgresql