Re: Feature freeze date for 8.1

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Hannu Krosing <hannu(at)skype(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Neil Conway <neilc(at)samurai(dot)com>, Oliver Jowett <oliver(at)opencloud(dot)com>, adnandursun(at)asrinbilisim(dot)com(dot)tr, Peter Eisentraut <peter_e(at)gmx(dot)net>, Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Feature freeze date for 8.1
Date: 2005-05-02 15:47:14
Message-ID: Pine.OSF.4.61.0505021800510.109089@kosh.hut.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Mon, 2 May 2005, Hannu Krosing wrote:

> Well, I've had problems with clients which resolve DB timeouts by
> closing the current connection and establish a new one.
>
> If it is actual DB timeout, then it all is ok, the server soon notices
> that the client connection is closed and kills itself.
>
> Problems happen when the timeout is caused by actual network problems -
> when i have 300 clients (server's max_connections=500) which try to
> reconnect after network outage, only 200 of them can do so as the server
> is holding to 300 old connections.
>
> In my case this has nothing to do with locks or transactions.
>
> It would be nice if I coud st up some timeut using keepalives (like ssh-
> s ProtocoKeepalives") and use similar timeouts on client and server.

FWIW, I've been bitten by this problem twice with other applications.

1. We had a DB2 database with clients running in other computers in the
network. A faulty switch caused random network outages. If the connection
timed out and the client was unable to send it's request to the server,
the client would notice that the connection was down, and open a new one.
But the server never noticed that the connection was dead. Eventually,
the maximum number of connections was reached, and the administrator had
to kill all the connections manually.

2. We had a custom client-server application using TCP across a network.
There was stateful firewall between the server and the clients that
dropped the connection at night when there was no activity. After a
couple of days, the server reached the maximum number of threads on the
platform and stopped accepting new connections.

In case 1, the switch was fixed. If another switch fails, the same will
happen again. In case 2, we added an application-level heartbeat that
sends a dummy message from server to client every 10 minutes.

TCP keep-alive with a small interval would have saved the day in both
cases. Unfortunately the default interval must be >= 2 hours, according
to RFC1122.

On most platforms, including Windows and Linux, the TCP keep-alive
interval can't be set on a per-connection basis. The ideal solution would
be to modify the operating system to support it.

What we can do in PostgreSQL is to introduce an application-level
heartbeat. A simple "Hello world" message sent from server to client that
the client would ignore would do the trick.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gurmeet Manku 2005-05-02 16:14:00 Citation for "Bad n_distinct estimation; hacks suggested?"
Previous Message Tom Lane 2005-05-02 15:37:57 Re: pg_locks needs a facelift

Browse pgsql-patches by date

  From Date Subject
Next Message adnandursun 2005-05-02 17:20:51 Re: Feature freeze date for 8.1
Previous Message Tom Lane 2005-05-02 15:07:39 Re: Feature freeze date for 8.1