Re: Deadlock in libpq

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Erik Hesselink <hesselink(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Deadlock in libpq
Date: 2011-03-24 13:23:51
Message-ID: AANLkTim7GNYP_=Ud=DLZJEo6AfSrcvZum8zKOEvywQK2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Mar 24, 2011 at 4:17 AM, Erik Hesselink <hesselink(at)gmail(dot)com> wrote:
> Hi,
>
> We're getting a deadlock in our application (a web application with a
> PostgreSQL backend) which I've traced to libpq. I've started our
> application in gdb, and when it hangs, I've inspected the backtraces.
> I've found a couple of threads I can account for (listening for new
> connections, background processes) and 77 threads waiting for a mutex
> lock:
>
> #0  0x00007ffff523d464 in __lll_lock_wait () from /lib/libpthread.so.0
> #1  0x00007ffff52385d9 in _L_lock_953 () from /lib/libpthread.so.0
> #2  0x00007ffff52383fb in pthread_mutex_lock () from /lib/libpthread.so.0
> #3  0x00007ffff6160650 in ?? () from /usr/lib/libpq.so.5
>      ==> pg_lockingcallback
> #4  0x00007ffff440b791 in ?? () from /lib/libcrypto.so.0.9.8
> #5  0x00007ffff440bcc9 in ?? () from /lib/libcrypto.so.0.9.8
> #6  0x00007ffff47652fb in SSL_new () from /lib/libssl.so.0.9.8
> #7  0x00007ffff61604dc in ?? () from /usr/lib/libpq.so.5
>      ==> pqsecure_open_client
> #8  0x00007ffff61525ce in PQconnectPoll () from /usr/lib/libpq.so.5
> #9  0x00007ffff6152f5e in ?? () from /usr/lib/libpq.so.5
>      ==> connectDBComplete
> #10 0x00007ffff6153c5f in PQconnectdb () from /usr/lib/libpq.so.5
> #11 0x0000000000f9b518 in sccR_info ()
> #12 0x0000000000000000 in ?? ()
>
> So it seems everything is waiting for a lock on a mutex from
> pq_lockarray (in fe-secure(dot)c(at)846). Does anybody have any idea how this
> can happen? Is this something we're doing wrong (I hope so) or a bug
> in libpq?
>
> Some background: this happens only after a couple of thousand requests
> (each doing about 15 database calls), with occasional other requests
> coming in at the same time. Our server uses a Haskell binding to libpq
> (HDBC [1] and HDBC-postgresql [2]). Both client and server run on the
> same machine, running 64bit Ubuntu 10.04. The database version is
> "PostgreSQL 8.4.7 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real
> (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit". I'm not sure how to determine
> the libpq version, but it is the most recent that comes with this
> ubuntu. The changelogs for Ubuntu suggest 8.4.7 as well. Connections
> are via TCP/IP to 127.0.0.1 with SSL turned on. The machine is under
> some CPU load when this happens. There is plenty of free memory.
>
> When I turned off SSL or connect via domain sockets, we got different
> errors that are possibly related: occasionally, the connection between
> client (our app) and server (database) is lost. On the client, we get:
>
>    connectPostgreSQL: server closed the connection unexpectedly
>    This probably means the server terminated abnormally
>    before or while processing the request.
>
> and on the server:
>
>    could not send data to client: Broken pipe
>
> There is no further context around these messages.
>
> Any help would be greatly appreciated.

How did you initialize ssl? You are waiting inside a lock that is
getting set up inside the crypto library. Unless you are having some
type of library initialization issue, I'm suspicious the problem is
really inside libpq. Is your application multithreaded, and if so are
you properly synchronizing access to the connection object, etc?

merlin

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2011-03-24 14:01:49 Re: [RMX:#] Re: Strange loss of data during INSERT
Previous Message Willy-Bas Loos 2011-03-24 13:19:03 Re: What does error "psql: Kerberos 5 authentication not supported" means?