Re: psycopg2 (async) socket timeout

From: Jan Urbański <wulczer(at)wulczer(dot)org>
To: Marko Kreen <markokr(at)gmail(dot)com>
Cc: Danny Milosavljevic <danny(dot)milo(at)gmail(dot)com>, psycopg(at)postgresql(dot)org
Subject: Re: psycopg2 (async) socket timeout
Date: 2011-02-15 13:32:32
Message-ID: 4D5A8070.7000200@wulczer.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: psycopg

On 15/02/11 06:39, Marko Kreen wrote:
> On Thu, Feb 3, 2011 at 10:04 PM, Danny Milosavljevic
> <danny(dot)milo(at)gmail(dot)com> wrote:
>> is it possible to specify the timeout for the socket underlying a connection?
>>
>> Alternatively, since I'm using the async interface anyway, is it
>> possible proactively cancel a query that is "stuck" since the TCP
>> connection to the database is down?
>>
>> So the specific case is:
>> - connect to the postgres database using psycopg2 while network is up
>> - run some queries, get the results fine etc
>> - send a query
>> - the network goes down before the result to this last query has been received
>> - neither a result nor an error callback gets called - as far as I can
>> see (using txpostgres.ConnectionPool)
>>
>> What's the proper way to deal with that?
>
> TCP keepalive. By default the timeouts are quite high,
> but they are tunable.
>
> libpq supports keepalive tuning since 9.0, on older libpq
> you can do it yourself:
>
> https://github.com/markokr/skytools/blob/master/python/skytools/psycopgwrapper.py#L153

After doing lots of tests, it seems that keepalives are not the full
solution. They're useful if you want to detect the connection breaking
while it's idle, but they don't help in the case of:

* the connection is idle
* the the app sends a keepalive, receives response
* before the next keepalive is sent, you want to do a query
* the connection breaks silently
* you try sending the query
* libpq tries to write the query to the conncetion socket, does not
receive TCP confirmation
* the kernel starts retransmitting the data, using TCP's RTO algorithm
* you don't get notified about the failure until the TCP gives up, which
might be a long time

So it seems to me that you need an application-level timeout also. I'm
thinking about supporting it in txpostgres, but will have to think
exactly how to do it and what would be the interface.

Alternatively, you can lower the kernel TCP retry parameters
(net.ipv4.tcp_retries1 and net.ipv4.tcp_retries2), which will make TCP
give up earlier. Unfortunately it seems that you can only set the
globally at the kernel level and not per connection, which IMHO is a bit
too scary. What bothers me is that the keepalives mechanism does not
come into play while you're doing TCP retries, but that's apparently how
TCP works (at least on Linux...).

If you want to detect the connection failing as soon as possible, and
not the next time you try to make a query, you need to regularly make
queries, IOW have a heartbeat. But all the things I wrote before still
apply, and without an app-level timeout or lowering the TCP retry
parameters it might take a lot of time to detect that the heartbeat failed.

Cheers,
Jan

In response to

Responses

Browse psycopg by date

  From Date Subject
Next Message Marko Kreen 2011-02-15 20:55:57 Re: psycopg2 (async) socket timeout
Previous Message Daniele Varrazzo 2011-02-15 13:10:30 Re: psycopg used in a ASP page fails