Re: A test for replay of regression tests

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A test for replay of regression tests
Date: 2022-08-04 01:13:00
Message-ID: 20220804011300.GJ19644@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 04, 2022 at 09:24:24AM +1200, Thomas Munro wrote:
> On Thu, Aug 4, 2022 at 3:30 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> > On Thu, Dec 09, 2021 at 12:10:23PM +1300, Thomas Munro wrote:
> > > This adds 2 whole minutes to the recovery check, when running with the
> > > Windows serial-only scripts on Cirrus CI (using Andres's CI patches).
> > > For Linux it adds ~20 seconds to the total of -j8 check-world.
> > > Hopefully that's time well spent, because it adds test coverage for
> > > all the redo routines, and hopefully soon we won't have to run 'em in
> > > series on Windows.
> >
> > Should 027-stream-regress be renamed to something that starts earlier ?
> > Off-list earlier this year, Andres referred to 000.

See also:
https://www.postgresql.org/message-id/20220213220709.vjz5rziuhfdpqxrg@alap3.anarazel.de

> Do you have any data on improved times from doing that?
>
> I have wondered about moving it into 001_stream_rep.pl.

The immediate motive for raising the question is due to working on your cygwin
patch (where I've set PROVE_FLAGS=-j3). The last invocation I have opened ends
like:

[20:46:47.577] [13:46:47] t/026_overwrite_contrecord.pl ........ ok 10264 ms ( 0.02 usr 0.02 sys + 11.25 cusr 35.95 csys = 47.24 CPU)
[20:47:08.087] [13:47:08] t/028_pitr_timelines.pl .............. ok 13153 ms ( 0.00 usr 0.00 sys + 4.03 cusr 14.79 csys = 18.82 CPU)
[20:47:08.999] [13:47:09] t/029_stats_restart.pl ............... ok 12631 ms ( 0.00 usr 0.02 sys + 7.40 cusr 23.30 csys = 30.71 CPU)
[20:47:34.353] [13:47:34] t/031_recovery_conflict.pl ........... ok 11337 ms ( 0.00 usr 0.00 sys + 3.84 cusr 11.82 csys = 15.66 CPU)
[20:47:35.070] [13:47:35] t/030_stats_cleanup_replica.pl ....... ok 14054 ms ( 0.02 usr 0.00 sys + 7.64 cusr 25.02 csys = 32.68 CPU)
[20:48:04.887] [13:48:04] t/032_relfilenode_reuse.pl ........... ok 12755 ms ( 0.00 usr 0.00 sys + 3.36 cusr 11.57 csys = 14.93 CPU)
[20:48:42.055] [13:48:42] t/033_replay_tsp_drops.pl ............ ok 43529 ms ( 0.00 usr 0.00 sys + 12.29 cusr 41.43 csys = 53.71 CPU)
[20:50:02.770] [13:50:02] t/027_stream_regress.pl .............. ok 198408 ms ( 0.02 usr 0.06 sys + 44.92 cusr 142.42 csys = 187.42 CPU)
[20:50:02.771] [13:50:02]
[20:50:02.771] All tests successful.
[20:50:02.771] Files=33, Tests=411, 402 wallclock secs ( 0.16 usr 0.27 sys + 138.03 cusr 441.56 csys = 580.01 CPU)

If 027 had been started sooner, this test might have finished up to 78sec
earlier. If lots of tests are added in the future, maybe it won't matter, but
it seems like it does now.

As I understand, checks are usually parallelized by "make -j" and not by
"prove". In that case, starting a slow test later doesn't matter. But it'd be
better for anyone who runs tap tests manually, and (I think) for meson.

As a one-off test on localhost:
time make check -C src/test/recovery
=> 11m42,790s
time make check -C src/test/recovery PROVE_FLAGS=-j2
=> 7m56,315s

After renaming it to 001:
time make check -C src/test/recovery
=> 11m33,887s (~same)
time make check -C src/test/recovery PROVE_FLAGS=-j2
=> 6m59,969s

I don't know how it affect the buildfarm (but I think that's not optimized
primarily for speed anyway).

--
Justin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-08-04 01:35:46 Re: Cleaning up historical portability baggage
Previous Message Michael Paquier 2022-08-04 01:09:27 Re: [PATCH] postgresql.conf.sample comment alignment.