Re: Hung JDBC connections

From: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
To: Mike Charnoky <noky(at)nextbus(dot)com>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: Hung JDBC connections
Date: 2012-06-10 03:39:05
Message-ID: 4FD416D9.8020808@ringerc.id.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

On 06/08/2012 10:20 PM, Mike Charnoky wrote:
> Hi,
>
> Yesterday we experienced a problem whereby hundreds of JDBC client
> connections remained in a hung state, despite the fact that the
> PostgreSQL server was actually rebooted.
From the stack trace below, that would seem to be clients that were
actively in a query and receiving data from the server at the time.
> The strange thing is that on the client side, according to "netstat"
> the TCP connections between the JDBC clients and PostgreSQL were still
> in the ESTABLISHED state (even though the server was rebooted).
Presumably your server wasn't rebooted cleanly, or at least PostgreSQL
wasn't able to shut down cleanly, so no RST or FIN packets were sent for
connections.

If a tcp client doesn't receive a FIN or RST from its peer, it won't
know the peer is going away and will continue waiting indefinitely. The
wait will only terminate if:

- The application using the tcp socket has set a timeout or alarm
that'll interrupt the read(), allowing it to decide that the remote peer
might've gone away and unilaterally close the connection; or

- The operating system has tcp keepalives configured, and concludes
after several missed keepalives that the remote peer is dead. The
operating system will close the socket and return control to the client
with an error flag set, allowing the client to continue.

If you don't want the default behaviour of the TCP stack, you need to
set a tcp keepalive globally for the OS, get your program (in this case
Java + PgJDBC) to set a keepalive flag on its sockets, or get the client
program to set an alarm to interrupt blocked socket reads after a timeout.

PgJDBC didn't offer the latter option until 9.1, so you had to use
keepalives. If you didn't use keepalives, connections would stay stuck
until whatever supervising tool decided they weren't going to respond
and terminated them. If nothing like a transaction manager was
supervising ,they'd just stay open.

> Is there a way this hang can be addressed by the JDBC driver? Again,
this is very disconcerting that the clients did not detect the server
failure. Does the connection use any sort of timeout mechanism? Thank
you for your help.

TCP can't tell the difference between "peer dead" and "peer unreachable
due to sustained connection issues or congestion", so it assumes the
peer is alive and will respond ... eventually. The wisdom of this
decision may be questionable, but at least keepalives provide a solution.

Of course, It's not very friendly for the JDBC driver to wait
indefinitely without periodically checking and seeing if the connection
was progressing, so I'm glad to see that's changed in 9.1.

--
Craig Ringer

In response to

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message wbrana 2012-06-10 09:05:37 BUG #6684: An I/O error occured while sending to the backend
Previous Message Craig Ringer 2012-06-10 03:18:50 Re: get/setReadOnly broken if default_transaction_read_only on