Re: BUG #3995: pqSocketCheck doesn't return

From: "Vivek Gupta" <vivek(dot)gupta(at)globallogic(dot)com>
To: <pgsql-bugs(at)postgresql(dot)org>
Cc: <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <kyouko(dot)noro(at)hp(dot)com>
Subject: Re: BUG #3995: pqSocketCheck doesn't return
Date: 2008-06-03 10:42:01
Message-ID: 478174F395A7E34BBD3891A50303D602378BA4@ex3-del1.synapse.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

Having spent some time analyzing the root cause, problem seems to be the
aspect that 'poll ()' library function is not timed. Say the connection
pooling is enabled whereby Driver manager attempts to reuse an existing
connection having checked connection state executing a probe query. Flow
is like having sent the query over the DB connection, which is actually
a TCP connection, it does 'poll ()' on the associated 'fd' for POLLIN
and POLLERR events waiting for the query result with no timeout. Also
there is no KEEP-ALIVE done for the underlying TCP connection.

Considering the above data flow there are two scenarios possible:

1. When sending out the query data over the DB connection i.e. the
underlying TCP connection, suppose there is no acknowledgment to the TCP
chunk since DB has gone down and is unreachable. In this case, TCP stack
will do retransmissions and finally the 'poll ()' call returns with
error. However, it takes approx. 15 min. for the TCP stack to notify
error to the application and finally 'poll ()' to return.

2. Consider another scenario where DB has gone down having
acknowledged the query data at the TCP stack level but prior to
successfully sending the query result. In this case, local TCP stack
will not report any error since the TCP chunk is already being
acknowledged and 'poll ()' system call could stuck forever waiting for
the query response. For this particular scenario, an application thread
could hang forever waiting for the query response.

With regards,

Vivek Gupta

Browse pgsql-bugs by date

  From Date Subject
Next Message Decibel! 2008-06-03 20:43:14 Re: BUG #4204: COPY to table with FK has memory leak
Previous Message tomas 2008-06-02 07:35:36 Re: what are the ways to avoid --- "ERROR: EXECUTE of SELECT ... INTO is not implemented yet"