Possible fix for occasional failures on castoroides etc

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>, dpage(at)pgadmin(dot)org
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Possible fix for occasional failures on castoroides etc
Date: 2012-09-16 16:04:16
Message-ID: 12502.1347811456@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

It's annoying that the buildfarm animals running on older versions of
Solaris randomly fail with "Connection refused" errors, such as in
today's example:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=castoroides&dt=2012-09-15%2015%3A42%3A52

I believe what's probably happening there is that the kernel has a small
hard-wired limit on the length of the postmaster's accept queue, and you
get this failure if too many connection attempts arrive faster than the
postmaster can service them. If that theory is correct, we could
probably prevent these failures by reducing the number of tests run in
parallel, which could be done by adding say
MAX_CONNECTIONS=5
to the environment in which the regression tests run. I'm not sure
though if that's "build_env" or some other setting for the buildfarm
script --- Andrew?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2012-09-16 16:44:47 Re: Possible fix for occasional failures on castoroides etc
Previous Message Tom Lane 2012-09-16 15:52:54 Re: Patch to include c.h