Re: CI and test improvements

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Noah Misch <noah(at)leadboat(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>
Subject: Re: CI and test improvements
Date: 2022-08-28 17:10:29
Message-ID: 20220828171029.GO2342@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Aug 28, 2022 at 09:07:52AM -0700, Andres Freund wrote:
> > --- /dev/null
> > +++ b/src/tools/ci/windows-compiler-warnings
>
> Wouldn't that be doable as something like
> sh -c 'if test -s file; then cat file;exit 1; fi"
> inside .cirrus.yml?

I had written it inline in a couple ways, like
- sh -exc 'f=msbuild.warn.log; if [ -s "$f" ]; then cat "$f"; exit 1; else exit 0; fi'

but then separated it out as you suggested in
20220227010908(dot)vz2a7dmfzgwg742w(at)alap3(dot)anarazel(dot)de

after I complained about cmd.exe requiring escaping for && and ||
That makes writing any shell script a bit perilous and a separate script
seems better.

> > Subject: [PATCH 03/25] cirrus/ccache: disable compression and show stats
> >
> > ccache since 4.0 enables zstd compression by default.
> >
> > With default compression enabled (https://cirrus-ci.com/task/6692342840164352):
> > linux has 4.2; 99MB cirrus cache; cache_size_kibibyte 109616
> > macos has 4.5.1: 47MB cirrus cache; cache_size_kibibyte 52500
> > freebsd has 3.7.12: 42MB cirrus cache; cache_size_kibibyte 134064
> > windows has 4.6.1; 180MB cirrus cache; cache_size_kibibyte 51179
> > todo: compiler warnings
> >
> > With compression disabled (https://cirrus-ci.com/task/4614182514458624):
> > linux: 91MB cirrus cache; cache_size_kibibyte 316136
> > macos: 41MB cirrus cache; cache_size_kibibyte 118068
> > windows: 42MB cirrus cache; cache_size_kibibyte 134064
> > freebsd is the same
> >
> > The stats should either be shown or logged (or maybe run with CCACHE_NOSTATS,
> > to avoid re-uploading cache tarball in a 100% cached build, due only to
> > updating ./**/stats).
> >
> > Note that ccache 4.4 supports CCACHE_STATSLOG, which seems ideal.
>
> I stared at this commit message for a while, trying to make sense of it, and
> couldn't really. I assume you're saying that the cirrus compression is better
> with ccache compression disabled, but it's extremely hard to parse out of it.

Yes, because ccache uses zstd-1, and cirrus uses gzip, which it's going
to use no matter what ccache does, and gzip's default -6 is better than
ccache's zstd-1.

> This does too much at once. Show stats, change cache sizes, disable
> compression.

The cache size change is related to the compression level change; ccache
prunes based on the local size, which was compressed with zstd-1 and,
with this patch, not compressed (so ~2x larger). Also, it's more
interesting to control the size uploaded to cirrus (after compression
ith gzip-6).

> > From 01e9abd386a4e6cc0125b97617fb42e695898cbf Mon Sep 17 00:00:00 2001
> > From: Justin Pryzby <pryzbyj(at)telsasoft(dot)com>
> > Date: Tue, 26 Jul 2022 20:30:02 -0500
> > Subject: [PATCH 04/25] cirrus/ccache: add explicit cache keys..
> >
> > Since otherwise, building with ci-os-only will probably fail to use the normal
> > cache, since the cache key is computed using both the task name and its *index*
> > in the list of caches (internal/executor/cache.go:184).
>
> Hm, perhaps worth confirming and/or reporting to cirrus rather?

I know because of reading their source. Unfortunately, there's no
commit history indicating the intent or rationale.
https://github.com/cirruslabs/cirrus-ci-agent/blob/master/internal/executor/cache.go#L183

> > From 6a6a97fc869fd1fd8b7ab5da5147f145581634f9 Mon Sep 17 00:00:00 2001
> > From: Justin Pryzby <pryzbyj(at)telsasoft(dot)com>
> > Date: Fri, 24 Jun 2022 00:09:12 -0500
> > Subject: [PATCH 08/25] cirrus/freebsd: run with more CPUs+RAM and do not
> > repartitiion
> >
> > There was some historic problem where tests under freebsd took 8+ minutes (and
> > before 4a288a37f took 15 minutes).
> >
> > This reduces test time from 10min to 3min.
> > 4 CPUs 4 tests https://cirrus-ci.com/task/4880240739614720
> > 4 CPUs 6 tests https://cirrus-ci.com/task/4664440120410112 https://cirrus-ci.com/task/4586784884523008
> > 4 CPUs 8 tests https://cirrus-ci.com/task/5001995491737600
> >
> > 6 CPUs https://cirrus-ci.com/task/6678321684545536
> > 8 CPUs https://cirrus-ci.com/task/6264854121021440
> >
> > See also:
> > https://www.postgresql.org/message-id/flat/20220310033347(dot)hgxk4pyarzq4hxwp(at)alap3(dot)anarazel(dot)de#f36c0b17e33e31e7925e7e5812998686
> > 8 jobs 7min https://cirrus-ci.com/task/6186376667332608
> >
> > xi-os-only: freebsd
>
> Typo.

No - it's deliberate so I can switch to and from "everything" to "this
only".

> > @@ -71,8 +69,6 @@ task:
> > fingerprint_key: ccache/freebsd
> > reupload_on_changes: true
> >
> > - # Workaround around performance issues due to 32KB block size
> > - repartition_script: src/tools/ci/gcp_freebsd_repartition.sh
> > create_user_script: |
> > pw useradd postgres
> > chown -R postgres:postgres .
> > --
>
> What's the story there - at some point that was important for performance
> because of the native block size triggering significant read-modify-write
> cycles with postres' writes. You didn't comment on it in the commit message.

Well, I don't know the history, but it seems to be unneeded now.

Is there a good description of the original problem ? Originally,
freebsd check-world took ~15min to run tests, and when we changed to use
-Og it took 10min. Since then, seems to have improved on its own, and
currently takes ~6min. This patch adds CPUs to make it run in ~4min,
and takes the opportuity to drop the historic repartition stuff.

> > From fd1c36a0bd8fa608ccdff5be3735dac5e3e48bf3 Mon Sep 17 00:00:00 2001
> > From: Justin Pryzby <pryzbyj(at)telsasoft(dot)com>
> > Date: Wed, 27 Jul 2022 16:54:47 -0500
> > Subject: [PATCH 09/25] cirrus/freebsd: run build+check in a make vpath
>
> > From 7052a32a21752b59632225684fc9426bb94e46e0 Mon Sep 17 00:00:00 2001
> > From: Justin Pryzby <pryzbyj(at)telsasoft(dot)com>
> > Date: Sun, 13 Feb 2022 17:56:40 -0600
> > Subject: [PATCH 10/25] cirrus/windows: increase timeout to 25min
>
> No explanation?

Because of the immediately following commit which makes it run all the
tests.

> > From 602983b2cf37fc43465c62330b2e15e9d6d2035d Mon Sep 17 00:00:00 2001
> > From: Justin Pryzby <pryzbyj(at)telsasoft(dot)com>
> > Date: Fri, 26 Aug 2022 12:00:10 -0500
> > Subject: [PATCH 15/25] f!and chdir
>
> I don't see the point of pointing fixup commits to the list.

It's a separate commit to make it easy to see the changes, separately,
since I imagine maybe the "chdir" part won't be desirable, or maybe the
PATH part won't. But I'm not sure, so I'm here soliciting feedback.

--
Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2022-08-28 17:22:39 Re: Expand palloc/pg_malloc API
Previous Message Justin Pryzby 2022-08-28 17:08:07 Re: [RFC] building postgres with meson - v12