Re: FW: Intermittent Stats Failiures: firefly: HEAD

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Larry Rosenman" <lrosenman(at)pervasive(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: FW: Intermittent Stats Failiures: firefly: HEAD
Date: 2006-01-11 20:28:41
Message-ID: 27826.1137011321@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Larry Rosenman" <lrosenman(at)pervasive(dot)com> writes:
>> Ever since the stats collector changes, I've seen intermittent
>> failures on 'firefly' in the buildfarm.

Yeah, you're not the only one. We haven't figured out what's causing
them. But while fooling with Joachim Wieland's pg_sleep patch just
now, I was struck by an idea: on machines where select() is
interruptible by signals, it is possible that the do_sleep() function
won't wait as long as specified. This could easily cause the observed
regression diff, if the test doesn't wait long enough for the stats
collector to update the stats.

It's not immediately obvious what signal might be arriving at the
backend, given that there's not supposed to be any other database
operations going on. It's barely possible that a SIGUSR1 (sinval
catchup interrupt) could be generated here, if one of the previous
group of tests were still in the process of shutting down its backend.
So I'm not sure about this theory ... but at least it's a theory.

If the theory is correct then the just-committed pg_sleep patch
should provide a permanent solution. We'll have to wait and see
if we see any more of those errors.

If we don't see any more such errors in HEAD for awhile, it might
be worth back-patching the implementation of pg_sleep into the
older branches' regression tests, so we don't keep seeing intermittent
regression failures in them either.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Treat 2006-01-11 21:31:20 sort operation leads planner to different number of rows?
Previous Message Tom Lane 2006-01-11 18:15:54 Re: Overflow of bgwriter's request queue