Re: BUG #5804: Connection aborted after many queries.

From: Paul Davis <paul(dot)joseph(dot)davis(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5804: Connection aborted after many queries.
Date: 2010-12-29 15:54:46
Message-ID: AANLkTi=MWhtW6xaY4eejWKWM++QVagyvY8046DRDmCJ6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, Dec 29, 2010 at 10:30 AM, Paul J. Davis
<paul(dot)joseph(dot)davis(at)gmail(dot)com> wrote:
>
> The following bug has been logged online:
>
> Bug reference:      5804
> Logged by:          Paul J. Davis
> Email address:      paul(dot)joseph(dot)davis(at)gmail(dot)com
> PostgreSQL version: 9.0.2
> Operating system:   OS X 10.6.5, Ubuntu 10.04
> Description:        Connection aborted after many queries.
> Details:
>
> After running many queries (millions) a connection will report an error that
> the server has unexpectedly closed the connection. I first noticed this
> through psycopg2, but I've been able to reproduce it with a small C program
> using only libpq which I've included below. I compiled this against a libpq
> built by Homebrew (after upgrading the formula to use a 9.0.2 tarball) on OS
> X 10.6.5. The server was installed from 9.0.2 package available from
> https://launchpad.net/~pitti/+archive/postgresql
>
> My next step is to try building libpq with --enable-cassert to see if that
> triggers anything client side. Let me know if there's something else I
> should be doing to debug this.
>
> This test has been bailing between 2.6 and 2.7M queries:
>
>
> #include <stdio.h>
> #include <stdlib.h>
> #include "libpq-fe.h"
>
> static void
> fail(PGconn* conn, PGresult* res)
> {
>    if(res != NULL) PQclear(res);
>    PQfinish(conn);
>    exit(1);
> }
>
> static void
> check(PGconn* conn, PGresult* res, const char* fmt)
> {
>    ExecStatusType status = PQresultStatus(res);
>
>    if(status != PGRES_COMMAND_OK && status != PGRES_TUPLES_OK)
>    {
>        fprintf(stderr, fmt, PQerrorMessage(conn));
>        fail(conn, res);
>    }
> }
>
> void
> run_query(PGconn* conn, PGresult* res)
> {
>    int nFields, i, j;
>
>    res = PQexec(conn, "DECLARE myportal CURSOR FOR select 1");
>    check(conn, res, "DECLARE CURSOR failed: %s");
>    PQclear(res);
>
>    res = PQexec(conn, "FETCH ALL in myportal");
>    check(conn, res, "FETCH ALL failed: %s");
>
>    nFields = PQnfields(res);
>    for(i = 0; i < PQntuples(res); i++)
>    {
>        for(j = 0; j < nFields; j++)
>        {
>            PQgetvalue(res, i, j);
>        }
>    }
>
>    PQclear(res);
>
>    res = PQexec(conn, "CLOSE myportal");
>    check(conn, res, "CLOSE failed: %s");
>    PQclear(res);
> }
>
> int
> main(int argc, char **argv)
> {
>    PGconn* conn;
>    PGresult* res;
>    int i;
>
>    if(argc != 2)
>    {
>        fprintf(stderr, "usage: %s DSN\n", argv[0]);
>        exit(1);
>    }
>
>    conn = PQconnectdb(argv[1]);
>
>    if(PQstatus(conn) != CONNECTION_OK)
>    {
>        fprintf(stderr, "Connection failed: %s", PQerrorMessage(conn));
>        fail(conn, NULL);
>    }
>
>    res = PQexec(conn, "BEGIN");
>    check(conn, res, "BEGIN failed: %s");
>    PQclear(res);
>
>    for(i = 0; i < 10000000; i++)
>    {
>        if((i+1) % 100000 == 0)
>        {
>            fprintf(stderr, "I: %d\n", i);
>        }
>        run_query(conn, res);
>    }
>
>    res = PQexec(conn, "END");
>    check(conn, res, "END failed: %s");
>    PQclear(res);
>
>    PQfinish(conn);
>
>    return 0;
> }
>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs
>

I should've mentioned the various version combinations I tried this with.

Originally the client was 8.2 ish on OS X 10.5.something (it was an
old MacBook I don't have anymore) against the 8.2 server package in
Ubuntu 9.04. The python scripts where I noticed this issue would run
fine against that combination. After upgrading my MacBook to a Mac
Pro, I ended up installing Postgres 9.0.1 on the client (and building
psycopg2 against that) which is when I started getting errors. The
original error in the 9.0.1 client against the older server was that
libpq would get stuck on a poll() call down when trying to fetch
tuples or execute a command.

After a bit of narrowing down what was the cause I ended up trying to
upgrade the server to see if it was just a weird interplay between
9.0.1 and the older server. After upgrading to Ubuntu 10.04 and
installing Postgres 8.4 (from apt) the error turned into the current
manifestation in that libpq would give an error saying that the server
had unexpectedly closed the connection (instead of blocking on the
poll() call).

At some point I upgraded my client install to 9.0.2 and started a
server locally. Running the test program against a local database
failed to trigger the bug. I then tried to downgrade my local client
to 8.4 and tested that against the 8.4 install on Ubuntu which showed
the bug. And finally I upgraded both the server and the client to
9.0.2 and I can trigger the bug.

Thanks,
Paul Davis

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Kevin Grittner 2010-12-29 15:58:38 Re: BUG #5804: Connection aborted after many queries.
Previous Message Paul J. Davis 2010-12-29 15:30:12 BUG #5804: Connection aborted after many queries.