On Tue, Mar 29, 2011 at 2:54 PM, Derrick Rice <derrick(dot)rice(at)gmail(dot)com>wrote:
>> Try trussing the backend process. You may find it in a network IO wait
>> trying to send data to a client that is hung or over a socket that was
>> timed out by a firewall or network equipment.
>> Such a condition will cause the backend to be unable to hear the
>> cancel. The statement will still show as running in pg_stat_activity.
>> SIGTERM on such a backend will probably also fall on deaf ears.
> I'm aware of that condition, which is exactly what the keepalive settings
> are supposed to detect.
So I spent some time reading Linux-2.6 TCP code and my previous statement is
downright wrong. Keepalive is only in use when there is no data
unacknowledged and no data to send. Retransmission timeouts are in use for
those other scenarios.
In any case, I would have expected a retransmission timeout. My new
hypothesis based on output from `ss' is that a firewall, NAT, or VPN of my
users is putting the connection into persist mode (setting the window size
to 0) when the end point of the connection is unresponsive. Furthermore, I
think that firewall is continuing to respond to the persist probes of my
machine until it finally decides that the end point is gone. At which point
it might be ignoring future probes, starting the retransmission timeouts for
So I'm not looking for any further help here, since this isn't a PostgreSQL
issue. If I resolve the problem I'll let you all know just for
entertainment purposes :)
In response to
pgsql-general by date
|Next:||From: Adrian Klaver||Date: 2011-03-29 23:24:58|
|Subject: Re: Date conversion using day of week|
|Previous:||From: John R Pierce||Date: 2011-03-29 21:42:33|
|Subject: Re: RPM for ODBC driver|