Skip site navigation (1) Skip section navigation (2)

Re: Socket read fails

From: David Hustace <david(at)opennms(dot)org>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: Socket read fails
Date: 2006-01-25 19:23:01
Message-ID: A174A100-FBCC-4513-8D5F-0FF5C2AF961B@opennms.org (view raw or flat)
Thread:
Lists: pgsql-jdbc
On Jan 25, 2006, at 1:03 AM, Colin Taylor wrote:


> For the benefit of anyone else who has this problem,  it seems to  
> be alleviated, at least in our case by rebooting the box every  
> night. Not ideal by any means but better than having the  
> application hang. Which kind of suggests some resource issue,  
> investing with linux sar command though doesnt reveal any process  
> or file limits being reached.
>

We have been fighting this very same problem with our open source  
project (OpenNMS) for a several months now but only on systems with  
fairly large databases (~=2.5 GB).  We recently built 8.1.2 on one  
customer's beefy Sun system (Sparc and Solaris 10) hoping to resolve  
the issue only to find it happening now more frequently now as well.   
8.1.2's performance, while running, is much better btw.

We have been through our database code over and over and we sure that  
we are closing all connections, statements, etc. though that doesn't  
seem to be the issue since our hangs on socketRead0 happen on a call  
to DriverManger.getConnection() (creating new connection) which  
appears to be different for your hang.  Here is a recent stack trace:

"OpenNMS.Poller.DefaultPollContext" prio=10 tid=0x006df638 nid=0xa5  
runnable [0x3cffe000..0x3cfffbf0]
         at java.net.SocketInputStream.socketRead0(Native Method)
         at java.net.SocketInputStream.read(SocketInputStream.java:129)
         at java.io.BufferedInputStream.fill(BufferedInputStream.java: 
218)
         at java.io.BufferedInputStream.read(BufferedInputStream.java: 
235)
         - locked <0x60fa45e8> (a java.io.BufferedInputStream)
         at org.postgresql.core.PGStream.ReceiveChar(PGStream.java:254)
         at org.postgresql.core.v2.QueryExecutorImpl.processResults 
(QueryExecutorImpl.java:408)
         at org.postgresql.core.v2.QueryExecutorImpl.execute 
(QueryExecutorImpl.java:364)
         at org.postgresql.core.v2.QueryExecutorImpl.execute 
(QueryExecutorImpl.java:258)
         - locked <0x60fa4608> (a  
org.postgresql.core.v2.QueryExecutorImpl)
         at org.postgresql.core.v2.ConnectionFactoryImpl.runSetupQuery 
(ConnectionFactoryImpl.java:405)
         at  
org.postgresql.core.v2.ConnectionFactoryImpl.runInitialQueries 
(ConnectionFactoryImpl.java:423)
         at  
org.postgresql.core.v2.ConnectionFactoryImpl.openConnectionImpl 
(ConnectionFactoryImpl.java:83)
         at org.postgresql.core.ConnectionFactory.openConnection 
(ConnectionFactory.java:65)
         at org.postgresql.jdbc2.AbstractJdbc2Connection.<init> 
(AbstractJdbc2Connection.java:116)
         at org.postgresql.jdbc3.AbstractJdbc3Connection.<init> 
(AbstractJdbc3Connection.java:30)
         at org.postgresql.jdbc3.Jdbc3Connection.<init> 
(Jdbc3Connection.java:24)
         at org.postgresql.Driver.makeConnection(Driver.java:369)
         at org.postgresql.Driver.connect(Driver.java:245)
         at java.sql.DriverManager.getConnection(DriverManager.java:525)
         - locked <0x48cd17e8> (a java.lang.Class)
         at java.sql.DriverManager.getConnection(DriverManager.java:171)
         - locked <0x48cd17e8> (a java.lang.Class)
         at  
org.opennms.netmgt.config.DatabaseConnectionFactory.getConnection 
(DatabaseConnectionFactory.java:1125)
         at org.opennms.netmgt.utils.JDBCTemplate.doExecute 
(JDBCTemplate.java:102)
         at org.opennms.netmgt.utils.JDBCTemplate.execute 
(JDBCTemplate.java:82)
         at  
org.opennms.netmgt.poller.DefaultQueryManager.resolveOutage 
(DefaultQueryManager.java:531)
         at org.opennms.netmgt.poller.DefaultPollContext$2.run 
(DefaultPollContext.java:139)
         at  
org.opennms.netmgt.poller.pollables.PendingPollEvent.processPending 
(PendingPollEvent.java:94)
         at org.opennms.netmgt.poller.DefaultPollContext.onEvent 
(DefaultPollContext.java:184)
         - locked <0x5a373ed0> (a java.util.LinkedList)
         at org.opennms.netmgt.eventd.EventIpcManagerDefaultImpl 
$ListenerThread.run(EventIpcManagerDefaultImpl.java:169)
         at java.lang.Thread.run(Thread.java:595)



This past summer, we had someone try to debug this on the system  
level and here is what he reported:

----------- begin debugging ------------------
Thread Dump showing thread who has locked DatabaseConnectionFactory:

"EventQueueProcessor" prio=5 tid=0x008262d0 nid=0x3b runnable  
[239ff000..239ffc28]
         at java.net.SocketInputStream.socketRead0(Native Method)
         at java.net.SocketInputStream.read(SocketInputStream.java:129)
         at java.io.BufferedInputStream.fill(BufferedInputStream.java: 
183)
         at java.io.BufferedInputStream.read(BufferedInputStream.java: 
201)
         - locked <0x8183b8a8> (a java.io.BufferedInputStream)
         at org.postgresql.core.PGStream.ReceiveChar(PGStream.java:166)
         at org.postgresql.core.QueryExecutor.executeV3 
(QueryExecutor.java:127)
         - locked <0x8183b8c8> (a org.postgresql.core.PGStream)
         at org.postgresql.core.QueryExecutor.execute 
(QueryExecutor.java:100)
         at org.postgresql.core.QueryExecutor.execute 
(QueryExecutor.java:43)
         at org.postgresql.jdbc1.AbstractJdbc1Connection.execSQL 
(AbstractJdbc1Connection.java:887)
         at  
org.postgresql.jdbc1.AbstractJdbc1Connection.openConnectionV3 
(AbstractJdbc1Connection.java:505)
         at  
org.postgresql.jdbc1.AbstractJdbc1Connection.openConnection 
(AbstractJdbc1Connection.java:214)
         at org.postgresql.Driver.connect(Driver.java:139)
         at java.sql.DriverManager.getConnection(DriverManager.java:512)
         - locked <0xf1b8f260> (a java.lang.Class)
         at java.sql.DriverManager.getConnection(DriverManager.java:171)
         - locked <0xf1b8f260> (a java.lang.Class)
         at  
org.opennms.netmgt.config.DatabaseConnectionFactory.getConnection 
(DatabaseConnectionFactory.java:1120)
         - locked <0x328b2758> (a java.util.LinkedList)
         at org.opennms.netmgt.xmlrpcd.XmlRpcNotifier.getNodeLabel 
(XmlRpcNotifier.java:355)

pstack showing the FD of java->pgsql socket read call:

-----------------  lwp# 59 / thread# 59  --------------------
ff33d2d8 read     (36, 939a18, 2000)
feccbc30 JVM_Read (36, 939a18, 2000, feccb5c8, ff364278, ff36c950) + 68
30edac88 Java_java_net_SocketInputStream_socketRead0 (826364,  
239ff100, 239ff0fc, 239ff0f8, 0, 2000) + 1e8

pfiles showing port number of the java->psql socket read call:

   54: S_IFSOCK mode:0666 dev:304,0 ino:16062 uid:0 gid:0 size:0
       O_RDWR
         SOCK_STREAM
         SO_SNDBUF(49152),SO_RCVBUF(49152)
         sockname: AF_INET 127.0.0.1  port: 50912
         peername: AF_INET 127.0.0.1  port: 5432

...after greping through the pfiles output of 17 postmaster  
processes, found
the one that has socket 50912 open:

    8: S_IFSOCK mode:0666 dev:304,0 ino:13881 uid:0 gid:0 size:0
       O_RDWR
         SOCK_STREAM
         SO_REUSEADDR,SO_KEEPALIVE,SO_SNDBUF(49152),SO_RCVBUF(49152)
         sockname: AF_INET 127.0.0.1  port: 5432
         peername: AF_INET 127.0.0.1  port: 50912

pstack of this postmaster process:

3029:   /usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/data
fefbca1c recv     (8, 283df0, 2000, 0)
000f7420 pq_getbyte (285c00, 328598, 1, 6, 5cbd8, 5cc30) + 38
00156a38 SocketBackend (ffbff1d8, 49, 0, 49, 0, 2) + 4
00156d28 ReadCommand (ffbff1d8, 0, ffffffff, fffffff8, 0, 2ad800) + 20
00159c94 PostgresMain (4, 2c3850, 2c3830, 6, 0, 0) + a7c
0012e414 BackendRun (2d2038, 1084, 916e0, 0, ff1d2000, 1000) + 5b8
0012dc60 BackendStartup (2d2038, 2b3b0c, 0, 0, 2b3800, 5) + 150
0012bfdc ServerLoop (2, 2b3800, 2b3800, ffbffb98, ffbffc38, 286000) +  
304
0012b7c8 PostmasterMain (0, 2bffe8, 2c7aa0, 2bf000, 2be400, 23f000) +  
b10
000f89cc main     (3, 2bffe8, ffbffe2c, 2be6dc, ff1d0140, ff1d0180) +  
220
00039d44 _start   (0, 0, 0, 0, 0, 0) + 5c


Conclusion java is doing a read on this socket, and postmaster is  
doing a recv.
----------end debugging---------------------

We are considering the use of DataSource instead of the deprecated  
DriverManager, since DriverManager is synchronized.  Any thoughts on  
the system level debugging above or the use of the DataSource Interface?

Note that we use very many threads, however, our connection pool  
keeps the number of connection down to around 12 with peaks for 20 or  
25.

-David


David Hustace
The OpenNMS Group, Inc.
Main  : +1 919 545 2553   Fax:   +1 503-961-7746
Direct: +1 919 827 1201   Skype: dhustace
Key Fingerprint: 8EC1 F319 2C1D 3197 AAB9  580F FACD 1D19 175E D903

In response to

Responses

pgsql-jdbc by date

Next:From: Gilbert, AntoineDate: 2006-01-25 19:51:24
Subject: strange problem
Previous:From: Lucas SanabioDate: 2006-01-25 18:45:59
Subject: Inserting the wrong date

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group