Re: psycopg2 (async) socket timeout

From: Jan Urbański <wulczer(at)wulczer(dot)org>
To: Danny Milosavljevic <danny(dot)milo+ml(at)gmail(dot)com>
Cc: psycopg(at)postgresql(dot)org
Subject: Re: psycopg2 (async) socket timeout
Date: 2011-02-14 19:16:06
Message-ID: 4D597F76.40204@wulczer.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: psycopg

On 14/02/11 19:59, Danny Milosavljevic wrote:
> Hi,
>
> 2011/2/9 Jan Urbański <wulczer(at)wulczer(dot)org>:
>> ----- Original message -----
>> I'll try to reproduce this problem, AIUI you should have the Deferred errback if the connection is lost, but perhaps it takes some time for Twisted to detect it (actually it takes time for the kernel to detect it). You might try playing with your TCP keepalive settings.
>
> I'm trying. No luck so far...
>
> http://twistedmatrix.com/trac/wiki/FrequentlyAskedQuestions says "If
> you rely on TCP timeouts, expect as much as two hours (the precise
> amount is platform specific) to pass between when the disruption
> occurs and when connectionLost is called". Oops.

Yup, default settings for TCP keepalives are quite high...

> Hmm, even when I connect, then just down the network interface and
> only after that call runQuery, it is also never calling back anything
> (well, I didn't wait more than half an hour per try so far).
>
> But good point, although does this even work for async sockets? -
> where you are not reading actively, that is, nobody knows you want to
> receive any data? If that worked, that would be the nicest fix. For
> the not-so-nice fix, read on :-)

AFAIK if you're connected through TCP and waiting for data from the
other side, and the other side decides to never send you anything (for
instance because it died and did not even send you a RST packet), you
have no way of detecting that short of trying to send something every
now and then and if there's no response assuming the connection's down.

So you actually *need* a heartbeat solution to be able to detect network
dying... I think the best idea would be starting a timer every time you
start a query and cancelling it when it finishes, and (important)
setting the timeout of that timer only a little bit higher than the
query timeout setting on the server. This way if your code times out the
server won't keep on running your query.

> I've now started to do it the way Daniele and you suggested ("just
> close it from the client"), so I modified the Connection to start a
> timer which will fire if I don't defuse it early enough (and modified
> ConnectionPool to check connections periodically and reconnect).

Well something like that ;) I'd try doing it on the per-query level,
actually. Since you can't have more than one outstanding query, your
keepalive won't be sent until the current query finishes.

Actually, libpq recently got a feature called PQPing that just checks
the state of the connection. So you can have timeouts on your queries
and periodic PQPings when you're not running anything. Reminds me:
psycopg2 needs to support PQPing, but that should be easy.

> After I receive a response, I defuse the timer. If not, the timer
> callback will be run. It will call the errback - which will call
> connection.close().
>
> As far as noticing the "disconnect" (well, potential disconnect) goes,
> this works perfectly.
> However, doing a connection.close() then doesn't seem to help much,
> still investigating why... getting the following:
>
> File "/usr/lib/python2.6/site-packages/twisted/internet/selectreactor.py",
> line 104, in doSelect
> [], timeout)
> exceptions.ValueError: file descriptor cannot be a negative integer (-1)
>
> So it seems the FD of the closed connection to postgres is still in
> the Twisted reactor?
> Seems I am missing some calls to self.reactor.removeReader or -Writer,
> maybe. Do those belong in Connection.close() ?

Ha, it always comes back to the ticket I filed when writing txpostgres:
http://twistedmatrix.com/trac/ticket/4539

Believe it or not, this problem seems to also prevent proper
LISTEN/NOTIFY implementation...

> If I try to reconnect periodically, can I use the same txpostgres
> Connection instance and just call connect() again?

I think you can, although recreating the Connection object should not be
a problem.

Jan

In response to

Responses

Browse psycopg by date

  From Date Subject
Next Message Jason Erickson 2011-02-15 00:35:12 Re: beta 2 release for win testing?
Previous Message Danny Milosavljevic 2011-02-14 18:59:41 Re: psycopg2 (async) socket timeout