Re: postmaster dead but backends still running?

From: Charles Hornberger <charlie(at)hss(dot)caltech(dot)edu>
To: pgsql-admin(at)postgresql(dot)org
Subject: Re: postmaster dead but backends still running?
Date: 2003-06-17 20:32:33
Message-ID: 3EEF7AE1.9070909@hss.caltech.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Other things I perhaps ought to mention: Trying to stop the postmaster
using pg_ctl fails (unsurprisingly, since pg_ctl relies on
/var/pgsql/data/postmaster.pid, which contains a nonexistent PID); I
haven't tried to start a new postmaster yet, because the old backends
are hanging around.

Nor have I attempted to restart the web server, which might allow the
hanging-round backends to die by closing the old connections it's
holding to them. I'm tempted to go ahead and do this, though I'm not
sure whether I ought to until I've diagnosed what's going on right now.

In case it's relevant, I've gone back through the logs and discovered
that for the past week or so I've been occasionally running out of
connections (I was running w/ the default of 16) and getting 'FATAL:
Non-superuser connection limit exceeded errors' (about a dozen a day),
but I can't find any other related messages in the logs.

If anyone has any suggestions, I'd really appreciate your input; I'm
getting a bit antsy since my production database server is basically
halfway down and users are wondering why their web pages don't work ...

-Charlie

Charles Hornberger wrote:
> I've got what looks like a really strange situation on my hands (or else
> I've got a normal situation that I'm looking at strangely): It appears
> that the main postmaster process is dead & gone, but I have a bunch of
> backends still running.
>
> I can't connect to the database server any more, but a bunch of old
> persistent connections (which are about four days old and which I think
> are being kept alive by my web server) are still up & running; at least
> some of them are serving data to web pages.
>
> To wit:
>
> [rhodes] data/$ /usr/ucb/ps axuw | grep post
> postgres 9238 0.2 1.4 8664 5104 ? S Jun 13 3:13
> /its/software/bin/postmaster
> postgres 9268 0.1 1.4 8672 5144 ? S Jun 13 3:26
> /its/software/bin/postmaster
> postgres 8920 0.1 0.6 2480 2024 pts/0 R 11:08:26 0:00 bash
> postgres 9237 0.1 1.4 8664 5104 ? S Jun 13 3:01
> /its/software/bin/postmaster
> root 5411 0.0 0.4 1904 1448 ? S Jun 09 0:00
> /software/stow/postfix-2.0.10/libexec/postfix/master
> postfix 5413 0.0 0.4 1992 1528 ? S Jun 09 0:00 qmgr -l -t
> fifo -u
> postfix 8857 0.0 0.4 1960 1552 ? S 11:03:14 0:00 pickup -l
> -t fifo -u
> postgres 9236 0.0 1.4 8664 5120 ? S Jun 13 3:12
> /its/software/bin/postmaster
> postgres 9243 0.0 1.5 8720 5584 ? S Jun 13 3:06
> /its/software/bin/postmaster
> postgres 9254 0.0 1.4 8656 5128 ? S Jun 13 3:22
> /its/software/bin/postmaster
> postgres 9278 0.0 1.4 8664 5192 ? S Jun 13 3:08
> /its/software/bin/postmaster
> postgres 9333 0.0 1.5 8672 5312 ? S Jun 13 3:33
> /its/software/bin/postmaster
> postgres 9379 0.0 1.4 8720 5176 ? S Jun 13 3:08
> /its/software/bin/postmaster
> postgres 9431 0.0 1.4 8672 5112 ? S Jun 13 3:18
> /its/software/bin/postmaster
> postgres 9877 0.0 0.0 2480 ? pts/0 R 11:47:15 0:00 bash
>
> The file /var/pgsql/data/postmaster.pid claims that the postmaster's PID
> is 27215; there's no process with that PID running on my system.
>
> Whenever I try to create a new connection, it fail:
>
> [rhodes] data/$ psql template1
> psql: could not connect to server: No such file or directory
> Is the server running locally and accepting
> connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
> [rhodes] data/$ psql -h localhost template1
> psql: could not connect to server: Connection refused
> Is the server running on host localhost and accepting
> TCP/IP connections on port 5432?
>
> Any ideas on what I should do now? I'm running 7.3.2 on Solaris 7.
>
> -Charlie
>

--
Charles Hornberger
Caltech
Division of the Humanities and Social Sciences
M/C 228-77
Tel (626) 395-3474

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Jonathan Gardner 2003-06-17 20:34:23 Re: [ADMIN] Notification
Previous Message Guillaume LELARGE 2003-06-17 20:05:56 Re: Bad link on techdocs