Re: Socket read fails

From: David Hustace <david(at)opennms(dot)org>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: Socket read fails
Date: 2006-01-25 19:23:01
Message-ID: A174A100-FBCC-4513-8D5F-0FF5C2AF961B@opennms.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

On Jan 25, 2006, at 1:03 AM, Colin Taylor wrote:

> For the benefit of anyone else who has this problem, it seems to
> be alleviated, at least in our case by rebooting the box every
> night. Not ideal by any means but better than having the
> application hang. Which kind of suggests some resource issue,
> investing with linux sar command though doesnt reveal any process
> or file limits being reached.
>

We have been fighting this very same problem with our open source
project (OpenNMS) for a several months now but only on systems with
fairly large databases (~=2.5 GB). We recently built 8.1.2 on one
customer's beefy Sun system (Sparc and Solaris 10) hoping to resolve
the issue only to find it happening now more frequently now as well.
8.1.2's performance, while running, is much better btw.

We have been through our database code over and over and we sure that
we are closing all connections, statements, etc. though that doesn't
seem to be the issue since our hangs on socketRead0 happen on a call
to DriverManger.getConnection() (creating new connection) which
appears to be different for your hang. Here is a recent stack trace:

"OpenNMS.Poller.DefaultPollContext" prio=10 tid=0x006df638 nid=0xa5
runnable [0x3cffe000..0x3cfffbf0]
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:
218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:
235)
- locked <0x60fa45e8> (a java.io.BufferedInputStream)
at org.postgresql.core.PGStream.ReceiveChar(PGStream.java:254)
at org.postgresql.core.v2.QueryExecutorImpl.processResults
(QueryExecutorImpl.java:408)
at org.postgresql.core.v2.QueryExecutorImpl.execute
(QueryExecutorImpl.java:364)
at org.postgresql.core.v2.QueryExecutorImpl.execute
(QueryExecutorImpl.java:258)
- locked <0x60fa4608> (a
org.postgresql.core.v2.QueryExecutorImpl)
at org.postgresql.core.v2.ConnectionFactoryImpl.runSetupQuery
(ConnectionFactoryImpl.java:405)
at
org.postgresql.core.v2.ConnectionFactoryImpl.runInitialQueries
(ConnectionFactoryImpl.java:423)
at
org.postgresql.core.v2.ConnectionFactoryImpl.openConnectionImpl
(ConnectionFactoryImpl.java:83)
at org.postgresql.core.ConnectionFactory.openConnection
(ConnectionFactory.java:65)
at org.postgresql.jdbc2.AbstractJdbc2Connection.<init>
(AbstractJdbc2Connection.java:116)
at org.postgresql.jdbc3.AbstractJdbc3Connection.<init>
(AbstractJdbc3Connection.java:30)
at org.postgresql.jdbc3.Jdbc3Connection.<init>
(Jdbc3Connection.java:24)
at org.postgresql.Driver.makeConnection(Driver.java:369)
at org.postgresql.Driver.connect(Driver.java:245)
at java.sql.DriverManager.getConnection(DriverManager.java:525)
- locked <0x48cd17e8> (a java.lang.Class)
at java.sql.DriverManager.getConnection(DriverManager.java:171)
- locked <0x48cd17e8> (a java.lang.Class)
at
org.opennms.netmgt.config.DatabaseConnectionFactory.getConnection
(DatabaseConnectionFactory.java:1125)
at org.opennms.netmgt.utils.JDBCTemplate.doExecute
(JDBCTemplate.java:102)
at org.opennms.netmgt.utils.JDBCTemplate.execute
(JDBCTemplate.java:82)
at
org.opennms.netmgt.poller.DefaultQueryManager.resolveOutage
(DefaultQueryManager.java:531)
at org.opennms.netmgt.poller.DefaultPollContext$2.run
(DefaultPollContext.java:139)
at
org.opennms.netmgt.poller.pollables.PendingPollEvent.processPending
(PendingPollEvent.java:94)
at org.opennms.netmgt.poller.DefaultPollContext.onEvent
(DefaultPollContext.java:184)
- locked <0x5a373ed0> (a java.util.LinkedList)
at org.opennms.netmgt.eventd.EventIpcManagerDefaultImpl
$ListenerThread.run(EventIpcManagerDefaultImpl.java:169)
at java.lang.Thread.run(Thread.java:595)

This past summer, we had someone try to debug this on the system
level and here is what he reported:

----------- begin debugging ------------------
Thread Dump showing thread who has locked DatabaseConnectionFactory:

"EventQueueProcessor" prio=5 tid=0x008262d0 nid=0x3b runnable
[239ff000..239ffc28]
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:
183)
at java.io.BufferedInputStream.read(BufferedInputStream.java:
201)
- locked <0x8183b8a8> (a java.io.BufferedInputStream)
at org.postgresql.core.PGStream.ReceiveChar(PGStream.java:166)
at org.postgresql.core.QueryExecutor.executeV3
(QueryExecutor.java:127)
- locked <0x8183b8c8> (a org.postgresql.core.PGStream)
at org.postgresql.core.QueryExecutor.execute
(QueryExecutor.java:100)
at org.postgresql.core.QueryExecutor.execute
(QueryExecutor.java:43)
at org.postgresql.jdbc1.AbstractJdbc1Connection.execSQL
(AbstractJdbc1Connection.java:887)
at
org.postgresql.jdbc1.AbstractJdbc1Connection.openConnectionV3
(AbstractJdbc1Connection.java:505)
at
org.postgresql.jdbc1.AbstractJdbc1Connection.openConnection
(AbstractJdbc1Connection.java:214)
at org.postgresql.Driver.connect(Driver.java:139)
at java.sql.DriverManager.getConnection(DriverManager.java:512)
- locked <0xf1b8f260> (a java.lang.Class)
at java.sql.DriverManager.getConnection(DriverManager.java:171)
- locked <0xf1b8f260> (a java.lang.Class)
at
org.opennms.netmgt.config.DatabaseConnectionFactory.getConnection
(DatabaseConnectionFactory.java:1120)
- locked <0x328b2758> (a java.util.LinkedList)
at org.opennms.netmgt.xmlrpcd.XmlRpcNotifier.getNodeLabel
(XmlRpcNotifier.java:355)

pstack showing the FD of java->pgsql socket read call:

----------------- lwp# 59 / thread# 59 --------------------
ff33d2d8 read (36, 939a18, 2000)
feccbc30 JVM_Read (36, 939a18, 2000, feccb5c8, ff364278, ff36c950) + 68
30edac88 Java_java_net_SocketInputStream_socketRead0 (826364,
239ff100, 239ff0fc, 239ff0f8, 0, 2000) + 1e8

pfiles showing port number of the java->psql socket read call:

54: S_IFSOCK mode:0666 dev:304,0 ino:16062 uid:0 gid:0 size:0
O_RDWR
SOCK_STREAM
SO_SNDBUF(49152),SO_RCVBUF(49152)
sockname: AF_INET 127.0.0.1 port: 50912
peername: AF_INET 127.0.0.1 port: 5432

...after greping through the pfiles output of 17 postmaster
processes, found
the one that has socket 50912 open:

8: S_IFSOCK mode:0666 dev:304,0 ino:13881 uid:0 gid:0 size:0
O_RDWR
SOCK_STREAM
SO_REUSEADDR,SO_KEEPALIVE,SO_SNDBUF(49152),SO_RCVBUF(49152)
sockname: AF_INET 127.0.0.1 port: 5432
peername: AF_INET 127.0.0.1 port: 50912

pstack of this postmaster process:

3029: /usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/data
fefbca1c recv (8, 283df0, 2000, 0)
000f7420 pq_getbyte (285c00, 328598, 1, 6, 5cbd8, 5cc30) + 38
00156a38 SocketBackend (ffbff1d8, 49, 0, 49, 0, 2) + 4
00156d28 ReadCommand (ffbff1d8, 0, ffffffff, fffffff8, 0, 2ad800) + 20
00159c94 PostgresMain (4, 2c3850, 2c3830, 6, 0, 0) + a7c
0012e414 BackendRun (2d2038, 1084, 916e0, 0, ff1d2000, 1000) + 5b8
0012dc60 BackendStartup (2d2038, 2b3b0c, 0, 0, 2b3800, 5) + 150
0012bfdc ServerLoop (2, 2b3800, 2b3800, ffbffb98, ffbffc38, 286000) +
304
0012b7c8 PostmasterMain (0, 2bffe8, 2c7aa0, 2bf000, 2be400, 23f000) +
b10
000f89cc main (3, 2bffe8, ffbffe2c, 2be6dc, ff1d0140, ff1d0180) +
220
00039d44 _start (0, 0, 0, 0, 0, 0) + 5c

Conclusion java is doing a read on this socket, and postmaster is
doing a recv.
----------end debugging---------------------

We are considering the use of DataSource instead of the deprecated
DriverManager, since DriverManager is synchronized. Any thoughts on
the system level debugging above or the use of the DataSource Interface?

Note that we use very many threads, however, our connection pool
keeps the number of connection down to around 12 with peaks for 20 or
25.

-David

David Hustace
The OpenNMS Group, Inc.
Main : +1 919 545 2553 Fax: +1 503-961-7746
Direct: +1 919 827 1201 Skype: dhustace
Key Fingerprint: 8EC1 F319 2C1D 3197 AAB9 580F FACD 1D19 175E D903

In response to

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Gilbert, Antoine 2006-01-25 19:51:24 strange problem
Previous Message Lucas Sanabio 2006-01-25 18:45:59 Inserting the wrong date