Best practise on closing connections in programs of unkown/unexpectedly long durations

From: Achilleas Mantzios <achill(at)matrix(dot)gatewaynet(dot)com>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Best practise on closing connections in programs of unkown/unexpectedly long durations
Date: 2008-07-29 10:28:39
Message-ID: 200807291328.40012.achill@matrix.gatewaynet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Hi, i face the following situation,
one of our hosts serves as a UUCP hub (for those interested it is dialed in over satellite ISDN by vessels at the 7 seas),
This host also serves as a NFS server to the main host where our apps run.
A cron job at the main app host reads data from those NFS mounted directories.
The peusocode of this program is:

try {
con = java.sql.DriverManager.getConnection("jdbc:postgresql://localhost/maindb");
outerCon = java.sql.DriverManager.getConnection("jdbc:postgresql://localhost/misc");
con.setAutoCommit(false);
outerCon.setAutoCommit(false);

PreparedStatement lockst=outerCon.prepareStatement("select 1 from dummylock for update");
ResultSet lockrs = lockst.executeQuery();
lockrs.close();
lockst.close();

while (there are more file to read) {
process file;
delete file;
con.commit();
}
outerCon.commit();
}
catch {
con.rollback();
outerCon.rollback();
}
finally {
con.close()
outrCon.close();
}

The outer connection and the lockst, serve as a giant lock on the program, in order to prevent
same instances of the same program meddling around with the same directory tree where
the files are.
the above program has been executed without issues every 5 minutes for the last
5 years.
In the last month, after a major upgrade, linux kernel, hardware, java (1.4.2->1.5.0), postgresql (7.4->8.3.3)
we experienced a "FATAL: sorry, too many clients already" situation, while our max_connections was at 110.

I did some datamining type of logging placing the postgresql log in a postgresql table for processing,
and i figured out the program that caused this and the very first instance in time of it.
So i knew it was this program i am now describing. After processing some file correctly from the above while loop
the program just slept.
WE knew that the first time the problem arouse it was due to network/switces/NFS problem.
It was like the java program tried to read a file, but the read blocked infinitely.
So the lock on dummylock was never released, and the accumating programs waiting
on this lock, increased the current connections to the max_connections limit.

So, since i cant blame the network team, or find a bug in my program, i decided to "catch" situations
where this happens, and then rollback both connections and close them.

So, what i came up with is to use an additional thread which would check if a certain period has passed
and close the connections.

public class ReadXmlFromVessels implements Runnable{
static long WARNING_TIMEOUT = 10000L;
static long ERROR_TIMEOUT = 20000L;

static private java.sql.Connection con=null;
static private java.sql.Connection outerCon=null;

static private PreparedStatement lockst;

public static void main(String[] args) {
ReadXmlFromVessels arxfv = new ReadXmlFromVessels();
Thread timer = new Thread(arxfv);
timer.start();

try {
con = java.sql.DriverManager.getConnection("jdbc:postgresql://localhost/maindb");
outerCon = java.sql.DriverManager.getConnection("jdbc:postgresql://localhost/misc");
con.setAutoCommit(false);
outerCon.setAutoCommit(false);

lockst=outerCon.prepareStatement("select 1 from dummylock for update");
ResultSet lockrs = lockst.executeQuery();
lockrs.close();
lockst.close();

while (there are more file to read) {
process file;
delete file;
con.commit();
}
outerCon.commit();
}
catch {
con.rollback();
outerCon.rollback();
}
finally {
con.close()
outrCon.close();
}
}

public void run () {
try {
Thread.sleep(this.WARNING_TIMEOUT);
mailMessage("Timeout warning from ReadXMLFromVessels","ReadXMLFromVessels Exceeded "+(WARNING_TIMEOUT/(1000L))+" seconds timeout");
Thread.sleep(this.ERROR_TIMEOUT - this.WARNING_TIMEOUT);
mailMessage("Timeout error from ReadXMLFromVessels","ReadXMLFromVessels Exceeded "+(ERROR_TIMEOUT/(1000L))+" seconds timeout. \nRolling Back, Closing Connection and exiting.");
if (this.con != null) {
this.con.rollback();
this.con.close();
}
if (this.outerCon != null) {
lockst.cancel();
this.outerCon.rollback();
this.outerCon.close();
}
System.exit(1);
}
catch (Exception e) {
System.out.println("ERROR: "+e.getMessage());
e.printStackTrace();
}
finally {
if (this.con != null) {
try {
this.con.rollback();
this.con.close();
}
catch (Exception sqle) {}
}
if (this.outerCon != null) {
try {
this.outerCon.rollback();
this.outerCon.close();
}
catch (Exception sqle) {}
}
System.exit(1);
}
}
}

It seems to work ok, even the lockst.cancel works, however i'd like to ask you if you have smth better to suggest.
Thanx.
--
Achilleas Mantzios

Browse pgsql-jdbc by date

  From Date Subject
Next Message Kris Jurka 2008-07-29 17:09:56 Re: numeric type
Previous Message Peter 2008-07-29 07:36:24 Re: numeric type