Deadlock in libpq

From: Erik Hesselink <hesselink(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Deadlock in libpq
Date: 2011-03-24 09:17:27
Message-ID: AANLkTimu1kiu68P=kR2qKFoyLJC8LXo1fS6PuRe29eKQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

We're getting a deadlock in our application (a web application with a
PostgreSQL backend) which I've traced to libpq. I've started our
application in gdb, and when it hangs, I've inspected the backtraces.
I've found a couple of threads I can account for (listening for new
connections, background processes) and 77 threads waiting for a mutex
lock:

#0 0x00007ffff523d464 in __lll_lock_wait () from /lib/libpthread.so.0
#1 0x00007ffff52385d9 in _L_lock_953 () from /lib/libpthread.so.0
#2 0x00007ffff52383fb in pthread_mutex_lock () from /lib/libpthread.so.0
#3 0x00007ffff6160650 in ?? () from /usr/lib/libpq.so.5
==> pg_lockingcallback
#4 0x00007ffff440b791 in ?? () from /lib/libcrypto.so.0.9.8
#5 0x00007ffff440bcc9 in ?? () from /lib/libcrypto.so.0.9.8
#6 0x00007ffff47652fb in SSL_new () from /lib/libssl.so.0.9.8
#7 0x00007ffff61604dc in ?? () from /usr/lib/libpq.so.5
==> pqsecure_open_client
#8 0x00007ffff61525ce in PQconnectPoll () from /usr/lib/libpq.so.5
#9 0x00007ffff6152f5e in ?? () from /usr/lib/libpq.so.5
==> connectDBComplete
#10 0x00007ffff6153c5f in PQconnectdb () from /usr/lib/libpq.so.5
#11 0x0000000000f9b518 in sccR_info ()
#12 0x0000000000000000 in ?? ()

So it seems everything is waiting for a lock on a mutex from
pq_lockarray (in fe-secure(dot)c(at)846). Does anybody have any idea how this
can happen? Is this something we're doing wrong (I hope so) or a bug
in libpq?

Some background: this happens only after a couple of thousand requests
(each doing about 15 database calls), with occasional other requests
coming in at the same time. Our server uses a Haskell binding to libpq
(HDBC [1] and HDBC-postgresql [2]). Both client and server run on the
same machine, running 64bit Ubuntu 10.04. The database version is
"PostgreSQL 8.4.7 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real
(Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit". I'm not sure how to determine
the libpq version, but it is the most recent that comes with this
ubuntu. The changelogs for Ubuntu suggest 8.4.7 as well. Connections
are via TCP/IP to 127.0.0.1 with SSL turned on. The machine is under
some CPU load when this happens. There is plenty of free memory.

When I turned off SSL or connect via domain sockets, we got different
errors that are possibly related: occasionally, the connection between
client (our app) and server (database) is lost. On the client, we get:

connectPostgreSQL: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

and on the server:

could not send data to client: Broken pipe

There is no further context around these messages.

Any help would be greatly appreciated.

Sincerely,

--
Erik Hesselink
http://silkapp.com

[1] http://hackage.haskell.org/package/HDBC
[2] http://hackage.haskell.org/package/HDBC-postgresql

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Willy-Bas Loos 2011-03-24 10:56:38 Re: DO Statement Body Parameters
Previous Message rsmogura 2011-03-24 09:13:44 Re: Understanding Datum