Re: Intermittent "make check" failures on hyena

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Intermittent "make check" failures on hyena
Date: 2006-08-06 16:47:14
Message-ID: 1325.1154882834@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Tom Lane wrote:
>> AFAIK it is not possible for Postgres itself to cause a "connection
>> refused" failure --- that's a kernel-level errno. So what's going on
>> here? The only idea that comes to mind is that this version of Solaris
>> has some very low limit on SOMAXCONN, and when the timing is just so
>> it's bouncing connection requests because several of them arrive faster
>> than the postmaster can fork off children. Googling suggests that there
>> are versions of Solaris with SOMAXCONN as low as 5 :-( ... but other
>> pages say that the default is higher, so this theory might be wrong.

> This is the box that Sun donated, btw.
> I get: ndd /dev/tcp tcp_conn_req_max_q => 128
> Is that the Solaris equivalent of SOMAXCONN? That's low, maybe, but not
> impossibly low.

Yeah, I found that variable name in googling. If it's 128 then there's
no way that it's causing the problem --- you'd have to assume a value in
the single digits to explain the observed failures.

I see one occurrence in the 8.1 branch on hyena, but the failure
probability seems to have jumped way up in HEAD since we put in the
C-coded pg_regress. This lends weight to the idea that it's a
timing-related issue, because pg_regress.c is presumably much faster
at forking off a parallel gang of psqls than the shell script was;
and it's hard to see what else about the pg_regress change could be
affecting the psqls' ability to connect once forked.

We probably need to get some Solaris experts involved in diagnosing
what's happening. Judging by the buildfarm results you should be able
to replicate it fairly easily by doing "make installcheck-parallel"
repeatedly.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jaime Casanova 2006-08-06 19:50:21 Re: problem with volatile functions in subselects ?
Previous Message Andrew Dunstan 2006-08-06 16:13:51 Re: Intermittent "make check" failures on hyena