Re: Recording test runtimes with the buildfarm

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Recording test runtimes with the buildfarm
Date: 2020-06-11 15:27:57
Message-ID: CABUevExDEgsTgEXfOUCWBNteFAmb4JJu0yCbuBZxxUV7LrOeVA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 11, 2020 at 4:56 PM Andrew Dunstan <
andrew(dot)dunstan(at)2ndquadrant(dot)com> wrote:

>
> On 6/11/20 10:21 AM, Stephen Frost wrote:
> > Greetings,
> >
> > * David Rowley (dgrowleyml(at)gmail(dot)com) wrote:
> >> On Thu, 11 Jun 2020 at 10:02, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >>> Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> >>>> I've been doing that in a little database that pulls down the results
> >>>> and analyses them with primitive regexes. First I wanted to know the
> >>>> pass/fail history for each individual regression, isolation and TAP
> >>>> script, then I wanted to build something that could identify tests
> >>>> that are 'flapping', and work out when the started and stopped
> >>>> flapping etc. I soon realised it was all too noisy, but then I
> >>>> figured that I could fix that by detecting crashes. So I classify
> >>>> every top level build farm run as SUCCESS, FAILURE or CRASH. If the
> >>>> top level run was CRASH, than I can disregard the individual per
> >>>> script results, because they're all BS.
> >>> If you can pin the crash on a particular test script, it'd be useful
> >>> to track that as a kind of failure. In general, though, both crashes
> >>> and non-crash failures tend to cause collateral damage to later test
> >>> scripts --- if you can't filter that out then the later scripts will
> >>> have high false-positive rates.
> >> I guess the fact that you've both needed to do analysis on individual
> >> tests shows that there might be a call for this beyond just recording
> >> the test's runtime.
> >>
> >> If we had a table that stored the individual test details, pass/fail
> >> and just stored the timing information along with that, then, even if
> >> the timing was unstable, it could still be useful for some analysis.
> >> I'd be happy enough even if that was only available as a csv file
> >> download. I imagine the buildfarm does not need to provide us with
> >> any tools for doing analysis on this. Ideally, there would be some
> >> run_id that we could link it back to the test run which would give us
> >> the commit SHA, and the animal that it ran on. Joining to details
> >> about the animal could be useful too, e.g perhaps a certain test
> >> always fails on 32-bit machines.
> >>
> >> I suppose that maybe we could modify pg_regress to add a command line
> >> option to have it write out a machine-readable file, e.g:
> >> testname,result,runtime\n, then just have the buildfarm client ship
> >> that off to the buildfarm server to record in the database.
> > That seems like it'd be the best approach to me, though I'd defer to
> > Andrew on it.
> >
> > By the way, if you'd like access to the buildfarm archive server where
> > all this stuff is stored, that can certainly be arranged, just let me
> > know.
> >
>
>
> Yeah, we'll need to work out where to stash the file. The client will
> pick up anything in src/regress/log for "make check", but would need
> adjusting for other steps that invoke pg_regress. I'm getting close to
> cutting a new client release, but I can delay it till we settle this.
>
>
> On the server side, we could add a table with a key of <animal,
> snapshot, branch, step, testname> but we'd need to make sure those test
> names were unique. Maybe we need a way of telling pg_regress to prepend
> a module name (e.g. btree_gist ot plperl) to the test name.
>

It seems pretty trivial to for example get all the steps out of check.log
and their timing with a regexp. I just used '^(?:test)?\s+(\S+)\s+\.\.\.
ok\s+(\d+) ms$' as the regexp. Running that against a few hundred build
runs in the db generally looks fine, though I didn't look into it in
detail.

Of course, that only looked at check.log, and more logic would be needed if
we want to look into the other areas as well, but as long as it's
pg_regress output I think it should be easy?

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-06-11 15:35:02 Re: how to create index concurrently on partitioned table
Previous Message Andrew Gierth 2020-06-11 14:57:56 Re: Windows regress fails (latest HEAD)