Re: Finding cause of test fails on the cfbot site

From: Andres Freund <andres(at)anarazel(dot)de>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Finding cause of test fails on the cfbot site
Date: 2021-02-22 03:34:47
Message-ID: 20210222033447.qvox6vtrwbpkd33b@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-02-17 15:18:02 -0500, Andrew Dunstan wrote:
> yeah. The cfbot runs check-world which makes it difficult for it to know
> which log files to show when there's an error. That's a major part of
> the reason the buildfarm runs a much finer grained set of steps.

I really think we need a better solution for this across the different
use-cases of running tests. For development parallel check-world is
important for a decent hack-test loop. But I waste a fair bit of time to
scroll back to find the original source of failures. And on the
buildfarm we waste a significant amount of time by limiting parallelism
due to the non-parallel sequence of finer grained steps.

And it's not just about logs - even just easily seeing the first
reported test failure without needing to search through large amounts of
text would be great.

With, um, more modern buildtools (e.g. ninja) you'll at least get the
last failure displayed at the end, instead of seing a lot of other
things after it like with make.

My suspicion is that, given the need to have this work for both msvc and
make, writing an in-core test-runner script is the only real option to
improve upon the current situation.

For make it'd not be hard to add a recursive 'listchecks' target listing
the individual tests that need to be run. Hacking up vcregress.pl to do
that, instead of what it currently does, shouldn't be too hard either.

Once there's a list of commands that need to be run it's not hard to
write a loop in perl that runs up to N tests in parallel, saving their
output. Which then allows to display the failing test reports at the
end.

If we then also add a convention that each test outputs something like
TESTLOG: path/to/logfile
...
it'd not be hard to add support for the test runner to list the files
that cfbot et al should output.

Looking around the tree, the most annoying bit to implement something
like this is that things below src/bin/, src/interfaces, src/test,
src/pl implement their own check, installcheck targets. Given the number
of these that just boil down to a variant of

check:
$(pg_regress_check)
$(prove_check)
installcheck:
$(pg_regress_installcheck)

it seems we should lift the REGRESS and TAP_TESTS specific logic in
pgxs.mk up into src/Makefiles.global. Which then would make something
list a global listchecks target easy.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-02-22 04:09:21 Re: repeated decoding of prepared transactions
Previous Message houzj.fnst@fujitsu.com 2021-02-22 03:27:34 RE: Determine parallel-safety of partition relations for Inserts