BUG #6342: libpq blocks forever in "poll" function

From: andreagrassi(at)sogeasoft(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #6342: libpq blocks forever in "poll" function
Date: 2011-12-16 07:45:42
Message-ID: E1RbSUA-0003kd-Tb@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 6342
Logged by: Andrea Grassi
Email address: andreagrassi(at)sogeasoft(dot)com
PostgreSQL version: 8.4.8
Operating system: SUSE SLES 10 SP4 64 BIT
Description:

Hi,
I have a big and strange problem. Sometimes, libpq remains blocked in “poll”
function even if the server has already answered to the query. If I attach
to the process using kdbg I found this stack:

__kernel_vsyscall()
poll() from /lib/libc.so.6
pqSocketCheck() from /home/pg/pgsql/lib-32/libpq.so.5
pqWaitTimed() from /home/pg/pgsql/lib-32/libpq.so.5
pqWait() from /home/pg/pgsql/lib-32/libpq.so.5
PQgetResult() from /home/pg/pgsql/lib-32/libpq.so.5
PQexecFinish() from /home/pg/pgsql/lib-32/libpq.so.5

To simplify the context and to reproduce the bug, I wrote a test program
(that I attach below) that uses only libpq interface (no other strange
libraries) to read my database at localhost.
It loop on a table of 64000 rows and for each row it reads another table.
Generally it take 1 minute to work. I put this program in a loop, so once it
finishes, it restarts.
Usually it works fine but sometimes (without any rule) it blocks. It blocks
always (with the stack above) executing PQexec function (“CLOSE CURSOR xx”
or “FETCH ALL IN xx”).
If I press “continue” on kdbg after attaching the process, the programs
continue its execution and exit with success.
Here the specifics of the platform (a SLES 10 SP4 64-bit WITHOUT any
VMWARE)

Server
HP DL 580 G7
4 CPU INTEL XEON X7550
64 GB RAM
8 HD 600GB SAS DP 6G 2,5” RAID 1 e RAID5

S.O.
SUSE SLES 10 SP4 64 BIT

Kernel
Linux linuxspanesi 2.6.16.60-0.85.1-smp #1 SMP Thu Mar 17 11:45:06 UTC 2011
x86_64 x86_64 x86_64 GNU/Linux

Server Postgres
8.4.8 - 64-bit

Libpq
8.4.8 – 32-bit

I try to recompile libpq in
- debug mode
- on a 64-bit machine with –m32 option
- on a 32-bit machine
- setting HAVE_POLL to false at line 1053 in fe-misc.c to force to execute
the other branch of “#ifdef/else” using the function “select()” instead of
“poll()”
but none fixes the bug. I had the same stack as above, except for the last
case in which I had “___newselect_nocancel()” instead of “poll()”.

If I check the state of the connection using the “netstat” command I get
this output:

tcp 24 0 127.0.0.1:49007 127.0.0.1:5432
ESTABLISHED 17415/pq_example.e

where the second field (recv-Q) is always blocked to a non-zero value.
It seems as the server has already answered but the libpq or poll function
don’t realize it.
Consider that the machine is very good and very fast.
It seems that the answer of the server arrives before the libpq starts
waiting for it (calling poll). Could be ?

I try to install a VMware this the same version of Linux and same version of
the kernel on a machine much less powerful: my program works fine and never
blocks.

Here below the code of the example program:

/*
* testlibpq.c
*
* Test the C version of libpq, the PostgreSQL frontend library.
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "libpq-fe.h"

static void
exit_nicely(PGconn *conn)
{
PQfinish(conn);
exit(1);
}

int
main(int argc, char **argv)
{
const char *conninfo;
PGconn *conn;
PGresult *res;
int i,
j;
/*
* If the user supplies a parameter on the command line, use it as the
* conninfo string; otherwise default to setting dbname=postgres and
using
* environment variables or defaults for all other connection
parameters.
*/

/* Make a connection to the database */
#ifdef CASE1
conn = PQsetdbLogin( getenv("SQLSERVER"), // pghost
0, // pgport
0, // pgoptions
0, // pgtty
"OSA", // dbName
0, // login
0 // pwd
);
#else
conn = PQconnectdb("dbname = OSA");
#endif

/* Check to see that the backend connection was successfully made */
if (PQstatus(conn) != CONNECTION_OK)
{
fprintf(stderr, "Connection to database failed: %s",
PQerrorMessage(conn));
exit_nicely(conn);
}

res = PQexec (conn, "SET datestyle='ISO'");
switch (PQresultStatus (res))
{
case PGRES_BAD_RESPONSE:
case PGRES_NONFATAL_ERROR:
case PGRES_FATAL_ERROR:
fprintf(stderr, "SET DATESTYLE command failed: %s",
PQresultErrorMessage(res));
break;
}
PQclear(res);

/*
* Our test case here involves using a cursor, for which we must be
inside
* a transaction block. We could do the whole thing with a single
* PQexec() of "select * from pg_database", but that's too trivial to
make
* a good example.
*/

/* Start a transaction block */
res = PQexec(conn, "BEGIN");
if (PQresultStatus(res) != PGRES_COMMAND_OK)
{
fprintf(stderr, "BEGIN command failed: %s", PQerrorMessage(conn));
PQclear(res);
exit_nicely(conn);
}

/*
* Should PQclear PGresult whenever it is no longer needed to avoid
memory
* leaks
*/
PQclear(res);

/*
* Fetch rows from pg_database, the system catalog of databases
*/
res = PQexec(conn, "DECLARE articoli CURSOR FOR select cdart from
base_a_artico ORDER BY cdart");
if (PQresultStatus(res) != PGRES_COMMAND_OK)
{
fprintf(stderr, "DECLARE CURSOR failed: %s", PQerrorMessage(conn));
PQclear(res);
exit_nicely(conn);
}
PQclear(res);

res = PQexec(conn, "FETCH ALL in articoli");
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
fprintf(stderr, "FETCH ALL failed: %s", PQerrorMessage(conn));
PQclear(res);
exit_nicely(conn);
}

/* next, print out the rows */
for (i = 0; i < PQntuples(res); i++)
{
read_rigpia(conn, PQgetvalue(res, i, 0));
}

PQclear(res);

/* close the portal ... we don't bother to check for errors ... */
res = PQexec(conn, "CLOSE articoli");
PQclear(res);

/* end the transaction */
res = PQexec(conn, "END");
PQclear(res);

/* close the connection to the database and cleanup */
PQfinish(conn);

return 0;
}

int read_rigpia(PGconn* conn, char* cdart)
{
PGresult *res; char sql[1024]; int i;
char* dtfab;
char* sum;

memset(sql,0,sizeof(sql));
sprintf(sql,"DECLARE rigpia CURSOR FOR select dtfab,sum(qtfan-qtpro)
from adp_d_rigpia where flsta='' and cdart='%s' and qtfan>qtpro and cddpu
not in ('04','05','06','07','08','09',
'91','92','93','94','95','96','97','98','A0','B8','C2','LF','SC') group by
dtfab", cdart);

res = PQexec(conn, sql);
if (PQresultStatus(res) != PGRES_COMMAND_OK)
{
fprintf(stderr, "DECLARE CURSOR rigpia failed: %s *** %s",
PQerrorMessage(conn),sql);
PQclear(res);
return 0;
}

PQclear(res);
res = PQexec(conn, "FETCH ALL in rigpia");
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
fprintf(stderr, "FETCH ALL failed in rigpia: %s",
PQerrorMessage(conn));
PQclear(res);
return 0;
}

/* next, print out the rows */
for (i = 0; i < PQntuples(res); i++)
{
dtfab = PQgetvalue(res, i, 0);
sum = PQgetvalue(res, i, 1);
}

PQclear(res); res = PQexec(conn, "CLOSE rigpia"); PQclear(res);
}

Regards,
Andrea

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Craig Ringer 2011-12-16 08:23:34 Re: BUG #6342: libpq blocks forever in "poll" function
Previous Message Pavel Stehule 2011-12-15 23:17:42 Re: user names & non-ASCII