| From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, michael(dot)banck(at)credativ(dot)de |
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Maybe BF "timedout" failures are the client script's fault? |
| Date: | 2026-01-10 14:34:16 |
| Message-ID: | 05efa923-a1b2-48b5-b9ec-9abf8758f720@dunslane.net |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 2026-01-09 Fr 3:41 PM, Tom Lane wrote:
> We've been assuming that all the "timedout" failures on BF member
> fruitcrow were due to some wonkiness in the GNU/Hurd platform.
> I got suspicious about that though after noticing that there are
> a small number of such failures on other animals, eg [1][2][3].
> In each case, the failure message claims it waited a good long
> time, which is at variance with the actually observed runtime.
> For instance [1] says "timed out after 14400 secs", but the
> actual total test runtime is only 01:24:28 according to the
> summary at the top of the page.
>
> Looking into the buildfarm client, I realized that it's assuming that
> "sleep($wait_time)" is sufficient to wait for $wait_time seconds.
> However, the Perl docs point out that sleep() can be interrupted by a
> signal. So now I'm suspicious that many of these failures are caused
> by a stray signal waking up the wait_timeout thread prematurely.
> GNU/Hurd might just be more prone to that than other platforms.
>
> I propose the attached patch to the BF client to try to make this
> more robust.
>
> regards, tom lane
>
> [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=ovenbird&dt=2025-11-14%2009%3A21%3A05
> [2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2025-10-17%2018%3A32%3A07
> [3] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=opaleye&dt=2026-01-08%2023%3A07%3A37
>
The patch seems reasonable on its face, but I doubt it's the issue.
Rather I think what's happening here is that a test is hanging silently
and lastcommand.log's mtime doesn't get updated, causing a misreporting
of the run duration. So in addition to the above I have added some code
to update that timestamp if the file exists (which should only be the
case with a timeout).
See
https://github.com/PGBuildFarm/client-code/commit/e5d67a35a0136a53e441fccf0ecc9b1b6322526c
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Kirill Reshke | 2026-01-10 14:56:28 | Re: amcheck: support for GiST |
| Previous Message | Kirill Reshke | 2026-01-10 14:17:06 | Re: GIN pageinspect support for entry tree and posting tree |