ci/cfbot: run windows tests under a timeout

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: ci/cfbot: run windows tests under a timeout
Date: 2022-02-02 18:31:07
Message-ID: 20220202183107.pb3jl5qg33ik6iii@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On windows cfbot currently regularly hangs / times out. Presumably this is due
to the issues discussed in https://postgr.es/m/CA%2BhUKG%2BG5DUNJfdE-qusq5pcj6omYTuWmmFuxCvs%3Dq1jNjkKKA%40mail.gmail.com
which lead to reverting [1] some networking related changes everywhere but
master.

But it's hard to tell - because the entire test task times out, we don't get
to see debugging information.

In earlier versions of the CI script I had tests run under a timeout command,
that killed the entire test run. I found that to be helpful when working on
AIO. But I removed that, in an attempt to simplify things, before
submitting. Turns out it was needed complexity.

The attached test adds a timeout (using git's timeout binary) to all vcregress
invocations. I've not re-added it to the other OSs, but I'm on the fence about
doing so.

The diff is a bit larger than one might think necessary: Yaml doesn't like % -
from the windows command variable syntax - at the start of an unquoted
string...

Separately, we should probably make Cluster.pm::psql() etc always use a
"fallback" timeout (rather than just when the test writer thought it's
necessary). Or perhaps Utils.pm's INIT should set up a timer after which an
individual test is terminated?

Greetings,

Andres Freund

[1]
commit 75674c7ec1b1607e7013b5cebcb22d9c8b4b2cb6
Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Date: 2022-01-25 12:17:40 -0500

Revert "graceful shutdown" changes for Windows, in back branches only.

This reverts commits 6051857fc and ed52c3707, but only in the back
branches. Further testing has shown that while those changes do fix
some things, they also break others; in particular, it looks like
walreceivers fail to detect walsender-initiated connection close
reliably if the walsender shuts down this way. We'll keep trying to
improve matters in HEAD, but it now seems unwise to push these changes
into stable releases.

Discussion: https://postgr.es/m/CA+hUKG+OeoETZQ=Qw5Ub5h3tmwQhBmDA=nuNO3KG=zWfUypFAw@mail.gmail.com

Attachment Content-Type Size
0001-wip-time-out-tests-on-windows.patch text/x-diff 4.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2022-02-02 18:37:38 Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
Previous Message Jaime Casanova 2022-02-02 18:10:39 Re: 2022-01 Commitfest