Re: Adding CI to our tree

From: Andres Freund <andres(at)anarazel(dot)de>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>
Subject: Re: Adding CI to our tree
Date: 2022-02-13 22:07:09
Message-ID: 20220213220709.vjz5rziuhfdpqxrg@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-02-13 15:42:13 -0600, Justin Pryzby wrote:
> > Note that prove unfortunately serializes the test output to be in the order it
> > started them, even when tests run concurrently. Extremely unhelpful, but ...
>
> Are you sure ?

Somewhat. I think it's a question of the prove version and some autodetection
of what type of environment prove is running in (stdin/stdout/stderr). I don't
remember the details, but at some point I pinpointed the source of the
serialization, and verified that parallelization makes a significant
difference in runtime even without being easily visible :(. But this is all
vague memory, so I might be wrong.

Reminds me that somebody (ugh, me???) should fix the perl > 5.26
incompatibilities on windows, then we'd also get a newer prove...

> > One nice bit is that the output is a *lot* easier to read.
> >
> > You could try improving the total time by having prove remember slow tests and
> > use that file to run the slowest tests first next time. --state slow,save or
> > such I believe. Of course we'd have to save that state file...
>
> In a test, this hurt rather than helped (13m 42s).
> https://cirrus-ci.com/task/6359167186239488
>
> I'm not surprised - it makes sense to run 10 fast tests at once, but usually
> doesn't make sense to run 10 slow tests tests at once (at least a couple of
> which are doing something intensive). It was faster (12m16s) to do it
> backwards (fastest tests first).
> https://cirrus-ci.com/task/5745115443494912

Hm.

I know I saw significant reduction in test times locally with meson by
starting slow tests earlier, because they're the limiting factor for the
*overall* test runtime - but I have more resources than on cirrus. Even
locally on a windows VM, with the current buildsystem, I found that moving 027
to earlier withing recoverycheck reduced the test time.

But it's possible that with all tests being scheduled concurrently, starting
the slow tests early leads to sufficient resource overcommit to be
problematic.

> BTW, does it make sense to remove test_regress_parallel_script ? The
> pg_upgrade run would do the same things, no ? If so, it might make sense to
> run that first. OTOH, you suggested to run the upgrade tests with checksums
> enabled, which seems like a good idea.

No, I don't think so. The main regression tests are by far the most important
thing during normal development. Just relying on main regression test runs
embedded in other tests, with different output and config of the main
regression test imo is just confusing.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joseph Koshakow 2022-02-13 22:12:54 Re: Fix overflow in DecodeInterval
Previous Message David Rowley 2022-02-13 21:55:16 Re: generic plans and "initial" pruning