Re: (Never?) Kill Postmaster?

From: Christian Schröder <cs(at)deriva(dot)de>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: (Never?) Kill Postmaster?
Date: 2007-11-08 23:02:11
Message-ID: 47339573.5050202@deriva.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi all,
any news about this issue? Anything else that I can do to help you?
Meanwhile there are 4 connections in the same state. (I did not do the
whole investigation on all 4, but since they all do not respond on a
SIGINT I assume that they all have the same problem.)
It may also be interesting that the 4 processes belong to the same two
users that already caused this problem earlier. Maybe it really has
something to do with their unstable internet connection? (I mentioned
this in an earlier mail.)
I have also noticed that one of these two users has many open
connections which are all idle. I guess that those connections are in
fact dead, but the database did not close them for any reason. The
pg_stat_activity entries for the corresponding backend processes are as
follows:

procpid | usename | current_query | waiting |
query_start | backend_start
---------+---------+---------------+---------+-------------------------------+-------------------------------
26033 | dpyrek | <IDLE> | f | 2007-11-08
10:21:01.555232+01 | 2007-11-08 09:55:01.59932+01
18331 | dpyrek | <IDLE> | f | 2007-11-07
11:34:24.968852+01 | 2007-11-07 11:08:29.043762+01
18940 | dpyrek | <IDLE> | f | 2007-11-07
14:29:52.987176+01 | 2007-11-07 13:14:48.609031+01
25868 | dpyrek | <IDLE> | f | 2007-11-08
09:47:46.938991+01 | 2007-11-08 09:13:34.101351+01
6719 | dpyrek | <IDLE> | f | 2007-11-06
12:06:14.875588+01 | 2007-11-06 11:10:00.566644+01
17987 | dpyrek | <IDLE> | f | 2007-11-07
10:31:50.517275+01 | 2007-11-07 10:11:07.310338+01
31808 | dpyrek | <IDLE> | f | 2007-11-08
22:55:03.931727+01 | 2007-11-08 22:55:03.766638+01
25484 | dpyrek | <IDLE> | f | 2007-11-08
08:32:57.265377+01 | 2007-11-08 07:44:30.845967+01
5972 | dpyrek | <IDLE> | f | 2007-11-06
08:51:54.57437+01 | 2007-11-06 08:14:03.560602+01
6241 | dpyrek | <IDLE> | f | 2007-11-06
09:59:02.018452+01 | 2007-11-06 09:20:49.092246+01
6136 | dpyrek | <IDLE> | f | 2007-11-06
09:14:40.729837+01 | 2007-11-06 08:57:29.55187+01
12645 | dpyrek | <IDLE> | f | 2007-11-02
10:08:24.856929+01 | 2007-11-02 09:35:37.640976+01
25254 | dpyrek | <IDLE> | f | 2007-11-08
07:29:04.547081+01 | 2007-11-08 06:33:47.707759+01
20275 | dpyrek | <IDLE> | f | 2007-11-03
09:14:12.73829+01 | 2007-11-03 08:57:05.555972+01
20216 | dpyrek | <IDLE> | f | 2007-11-03
08:46:40.555354+01 | 2007-11-03 08:40:31.756993+01
12435 | dpyrek | <IDLE> | f | 2007-11-02
09:28:53.361365+01 | 2007-11-02 08:48:11.589485+01
19633 | dpyrek | <IDLE> | f | 2007-11-03
08:34:16.263487+01 | 2007-11-03 05:46:16.811069+01
12156 | dpyrek | <IDLE> | f | 2007-11-02
08:10:11.558427+01 | 2007-11-02 07:49:03.442489+01
4899 | dpyrek | <IDLE> | f | 2007-11-01
12:42:30.880391+01 | 2007-11-01 10:56:18.513398+01
11988 | dpyrek | <IDLE> | f | 2007-11-02
07:38:10.315758+01 | 2007-11-02 07:02:52.438251+01
4490 | dpyrek | <IDLE> | f | 2007-11-01
09:51:42.216745+01 | 2007-11-01 09:34:18.63771+01

A ptrace of one of these connections yields the following result:

db2:/home/pgsql/data # strace -p 4899
Process 4899 attached - interrupt to quit
futex(0x994000, FUTEX_WAIT, 2, NULL

This looks identical (even with the same uaddr parameter) for the other
processes.

In the log file I find many lines like this:

<2007-11-08 22:56:19 CET - dpyrek> LOG: could not receive data from
client: Die Wartezeit für die Verbindung ist abgelaufen
<2007-11-08 22:56:19 CET - dpyrek> LOG: unexpected EOF on client connection

I'm not sure that these lines correspond to the dead connections, but at
least it's the same user.

Does this additional information help you in any way? I'm a little bit
afraid that eventually the maximum number of connections will be
exceeded. I don't want to "kill -9" the processes because the last time
I did this the database was in recovery mode for a substantial amount of
time.

Any help is still highly appreciated!

Regards,
Christian

P.S.: If nothing else helps I could also grant one of you guys root
access to the database machine.

--
Deriva GmbH Tel.: +49 551 489500-42
Financial IT and Consulting Fax: +49 551 489500-91
Hans-Böckler-Straße 2 http://www.deriva.de
D-37079 Göttingen

Deriva CA Certificate: http://www.deriva.de/deriva-ca.cer

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Gregory Stark 2007-11-08 23:26:56 Re: Optimal time series sampling.
Previous Message Simon Riggs 2007-11-08 21:45:46 Re: "Resurrected" data files - problem?