Re: psycopg2 (async) socket timeout

From: Marko Kreen <markokr(at)gmail(dot)com>
To: Jan Urbański <wulczer(at)wulczer(dot)org>
Cc: Danny Milosavljevic <danny(dot)milo(at)gmail(dot)com>, psycopg(at)postgresql(dot)org
Subject: Re: psycopg2 (async) socket timeout
Date: 2011-02-15 20:55:57
Message-ID: AANLkTi=N4xwp74=QJ8GsjYyXCDsVEF2b+oK9q270-o4Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: psycopg

On Tue, Feb 15, 2011 at 3:32 PM, Jan Urbański <wulczer(at)wulczer(dot)org> wrote:
> On 15/02/11 06:39, Marko Kreen wrote:
>> On Thu, Feb 3, 2011 at 10:04 PM, Danny Milosavljevic
>> <danny(dot)milo(at)gmail(dot)com> wrote:
>>> is it possible to specify the timeout for the socket underlying a connection?
>>>
>>> Alternatively, since I'm using the async interface anyway, is it
>>> possible proactively cancel a query that is "stuck" since the TCP
>>> connection to the database is down?
>>>
>>> So the specific case is:
>>> - connect to the postgres database using psycopg2 while network is up
>>> - run some queries, get the results fine etc
>>> - send a query
>>> - the network goes down before the result to this last query has been received
>>> - neither a result nor an error callback gets called - as far as I can
>>> see (using txpostgres.ConnectionPool)
>>>
>>> What's the proper way to deal with that?
>>
>> TCP keepalive.  By default the timeouts are quite high,
>> but they are tunable.
>>
>> libpq supports keepalive tuning since 9.0, on older libpq
>> you can do it yourself:
>>
>>   https://github.com/markokr/skytools/blob/master/python/skytools/psycopgwrapper.py#L153

Keepalive will help to detect if TCP connection is down,
it will not help if connection is up but server app is unresponsive.

> After doing lots of tests, it seems that keepalives are not the full
> solution. They're useful if you want to detect the connection breaking
> while it's idle, but they don't help in the case of:
>
> * the the app sends a keepalive, receives response

Sort of true, except Postgres does not have app-level
keepalive (except SELECT 1). The PQping mentioned
earlier creates new connection.

> * the connection is idle
> * before the next keepalive is sent, you want to do a query
> * the connection breaks silently
> * you try sending the query
> * libpq tries to write the query to the conncetion socket, does not
> receive TCP confirmation

The TCP keepalive should help for those cases, perhaps
you are doing something wrong if you are not seeing the effect.

> * the kernel starts retransmitting the data, using TCP's RTO algorithm
> * you don't get notified about the failure until the TCP gives up, which
> might be a long time

I'm not familiar with RTO, so cannot comment.

Why would it stop keepalive from working?

> So it seems to me that you need an application-level timeout also. I'm
> thinking about supporting it in txpostgres, but will have to think
> exactly how to do it and what would be the interface.
>
> Alternatively, you can lower the kernel TCP retry parameters
> (net.ipv4.tcp_retries1 and net.ipv4.tcp_retries2), which will make TCP
> give up earlier. Unfortunately it seems that you can only set the
> globally at the kernel level and not per connection, which IMHO is a bit
> too scary. What bothers me is that the keepalives mechanism does not
> come into play while you're doing TCP retries, but that's apparently how
> TCP works (at least on Linux...).
>
> If you want to detect the connection failing as soon as possible, and
> not the next time you try to make a query, you need to regularly make
> queries, IOW have a heartbeat. But all the things I wrote before still
> apply, and without an app-level timeout or lowering the TCP retry
> parameters it might take a lot of time to detect that the heartbeat failed.

The need for periodic query is exactly the thing that keepalive
should fix. OTOH, if you have connections that are long time idle
you could simply drop them.

We have the (4m idle + 4x15sec ping) parameters as
default and they work fine - dead connection is killed
after 5m.

--
marko

In response to

Responses

Browse psycopg by date

  From Date Subject
Next Message Jan Urbański 2011-02-15 22:13:17 Re: psycopg2 (async) socket timeout
Previous Message Jan Urbański 2011-02-15 13:32:32 Re: psycopg2 (async) socket timeout