Re: Maybe BF "timedout" failures are the client script's fault?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Banck <mbanck(at)gmx(dot)net>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Maybe BF "timedout" failures are the client script's fault?
Date: 2026-01-09 21:42:22
Message-ID: 2430115.1767994942@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Michael Banck <mbanck(at)gmx(dot)net> writes:
> On Fri, Jan 09, 2026 at 03:41:03PM -0500, Tom Lane wrote:
>> Looking into the buildfarm client, I realized that it's assuming that
>> "sleep($wait_time)" is sufficient to wait for $wait_time seconds.
>> However, the Perl docs point out that sleep() can be interrupted by a
>> signal. So now I'm suspicious that many of these failures are caused
>> by a stray signal waking up the wait_timeout thread prematurely.

> That might be the case for those other failures, but unfortunately, I
> think the fruitcrow failures are really because it gets stuck endlessly
> in the test_shm_mq test (it is always that one) and only the test
> timeout kicks it out.

If it's always the same test, then yeah that's evidence against
my theory (at least for fruitcrow's failures).

> I've ran that test manually quite a lot and either it finishes in 10-15
> seconds, or (presumably) never. This is not really easy to see in the
> public builfarm logs (at least I can't find it on a quick glance), but
> I've routinely checked the log timestamps of the runs, and they really
> take one hour (wait_timeout) in the case of a hang.

Hmm. Then why is the BF report showing that the total runtime is
nowhere near that? I wonder how those times are gathered ...

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Banck 2026-01-09 21:53:04 Re: Maybe BF "timedout" failures are the client script's fault?
Previous Message Michael Banck 2026-01-09 21:32:55 Re: Maybe BF "timedout" failures are the client script's fault?