Timeout control within tests

From: Noah Misch <noah(at)leadboat(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Timeout control within tests
Date: 2022-02-18 05:28:42
Message-ID: 20220218052842.GA3627003@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 05, 2021 at 03:55:20PM -0500, Tom Lane wrote:
> We have, almost invariably, regretted it when we tried to use short
> timeouts in test cases.

> More generally, sometimes people want to do things like run a test
> under valgrind. So it's not just "underpowered machines" that may
> need a generous timeout. Even if we did reduce the default, I'd
> want a way (probably via an environment variable, cf PGCTLTIMEOUT)
> to kick it back up.

I have a few use cases for that:

1. My buildfarm members have more and more competition for CPU and I/O.
Examples where I suspect animal slowness caused a 180s timeout:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=topminnow&dt=2021-04-11%2003%3A11%3A39
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern&dt=2021-07-26%2004%3A38%3A29

2. When I'm developing a change and I locally break a test in a way that leads
to a timeout, I like to be able to lower that timeout.

3. I want more tests to use the right timeout from the start. Low-timeout
tests appear at least a few times per year:
d03eeab Mon May 31 00:29:58 2021 -0700 Raise a timeout to 180s, in test 010_logical_decoding_timelines.pl.
388b959 Sat Feb 27 07:02:56 2021 -0800 Raise a timeout to 180s, in contrib/test_decoding.
08dde1b Tue Dec 22 11:10:12 2020 -0500 Increase timeout in 021_row_visibility.pl.
8961355 Sat Apr 25 18:45:27 2020 -0700 Raise a timeout to 180s, in test 003_recovery_targets.pl.
1db439a Mon Dec 10 20:15:42 2018 -0800 Raise some timeouts to 180s, in test code.

I propose to have environment variable PG_TEST_TIMEOUT_DEFAULT control the
timeout used in the places that currently hard-code 180s. TAP tests should
retrieve the value via $PostgreSQL::Test::Utils::timeout_default. pg_regress
tests should retrieve it via \getenv. I would like to back-patch the TAP
part, for cause (1). (The pg_regress part hasn't been a buildfarm problem,
and \getenv is new in v15.) Patches attached. I considered and excluded
other changes, for now:

a. I considered consolidating this with PGISOLATIONTIMEOUT (default 300). One
could remove the older variable entirely or make isolationtester use the
first-available of [PGISOLATIONTIMEOUT, 2 * PG_TEST_TIMEOUT_DEFAULT, 360].
Does anyone have an opinion on what, if anything, to do there?

b. I briefly made stats.sql accept PG_TEST_TIMEOUT_DEFAULT to override its
hard-coded 30s timeout. However, a higher timeout won't help when a UDP
buffer fills. If the test were structured to observe evidence of a vacant
UDP buffer before proceeding with the test stat messages, a higher timeout
could make more sense. I added a comment.

c. One could remove timeout-duration function arguments (e.g. from
pg_recvlogical_upto) and just have the function consult timeout_default.
This felt like highly-optional refactoring.

Attachment Content-Type Size
tap-timeout-default-var-v1.patch text/plain 21.2 KB
regress-timeout-default-var-v1.patch text/plain 3.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-02-18 05:48:25 Re: Timeout control within tests
Previous Message Masahiko Sawada 2022-02-18 05:22:52 Re: Fix a comment in worker.c