Re: CI and test improvements

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Noah Misch <noah(at)leadboat(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, samay sharma <smilingsamay(at)gmail(dot)com>
Subject: Re: CI and test improvements
Date: 2022-11-13 23:53:04
Message-ID: 20221113235303.GA26337@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 04, 2022 at 06:59:46PM -0700, Andres Freund wrote:
> Hi,
>
> On 2022-11-04 18:54:12 -0500, Justin Pryzby wrote:
> > Subject: [PATCH 1/8] meson: PROVE is not required
> > Subject: [PATCH 3/8] meson: rename 'main' tasks to 'regress' and 'isolation'
>
> Pushed, thanks for the patches.

Thanks.

> > diff --git a/.cirrus.yml b/.cirrus.yml
> > index 9f2282471a9..99ac09dc679 100644
> > --- a/.cirrus.yml
> > +++ b/.cirrus.yml
> > @@ -451,12 +451,20 @@ task:
> >
> > build_script: |
> > vcvarsall x64
> > - ninja -C build
> > + ninja -C build |tee build/meson-logs/build.txt
> > + REM Since pipes lose exit status of the preceding command, rerun compilation,
> > + REM without the pipe exiting now if it fails, rather than trying to run checks
> > + ninja -C build > nul
>
> This seems mighty grotty :(. but I guess it's quick enough not worry about,
> and I can't come up with a better plan.
>
> It doesn't seem quite right to redirect into meson-logs/ to me, my
> interpretation is that that's "meson's namespace". Why not just store it in
> build/?

I put it there so it'd be included with the build artifacts.
Maybe it's worth adding a separate line to artifacts for stuff like
this, and ccache log ?

> > From e85fe83fd8a4b4c79a96d2bf66cd6a5e1bdfcd1e Mon Sep 17 00:00:00 2001
> > From: Justin Pryzby <pryzbyj(at)telsasoft(dot)com>
> > Date: Sat, 26 Feb 2022 19:34:35 -0600
> > Subject: [PATCH 5/8] cirrus: build docs as a separate task..

> > + # Exercise HTML and other docs:
> > + ninja_docs_build_script: |
> > + mkdir build.ninja
> > + cd build.ninja
>
> Perhaps build-ninja instead? build.ninja is the filename for ninja's build
> instructions, so it seems a bit confusing.

Sure.

Do you think building docs with both autoconf and meson is what's
desirable here ?

I'm not sure if this ought to be combined with/before/after your "move
compilerwarnings task to meson" patch? (Regarding that patch: I
mentioned that it shouldn't use ccache -C, and it should use
meson_log_artifacts.)

> > From: Justin Pryzby <pryzbyj(at)telsasoft(dot)com>
> > Date: Tue, 26 Jul 2022 20:30:02 -0500
> > Subject: [PATCH 6/8] cirrus/ccache: add explicit cache keys..
> >
> > Since otherwise, building with ci-os-only will probably fail to use the
> > normal cache, since the cache key is computed using both the task name
> > and its *index* in the list of caches (internal/executor/cache.go:184).
>
> Seems like this would potentially better addressed by reporting a bug to the
> cirrus folks?

You said that before, but I don't think so - since they wrote code to do
that, it's odd to file a bug that says that the behavior is wrong. I am
curious why, but it seems delibrate.

https://www.postgresql.org/message-id/20220828171029.GO2342%40telsasoft.com

> There's enough copies of this to make it worth deduplicating. If we
> use something like fingerprint_script: echo
> ccache/$CIRRUS_BRANCH/$CIRRUS_OS we can use a yaml ref?
I'll look into it...

> I think you experimented with creating a 'base' ccache dir (e.g. on the master
> branch) and then using branch specific secondar caches?

I have to revisit that sometime.

That's a new feaure in ccache 4.4, which is currently only in macos.
This is another thing that it'd be easier to test by having cfbot
clobber the cirrus.yml rather than committing to postgres repo.
(Technically, it should probably only do use the in-testing cirrus.yml
if the patch it's testing doesn't itself modify .cirrus.yml)

> How did that turn out? I think cfbot's caches constantly get removed
> due to overrunning the global space.

For cfbot, I don't know if there's much hope that any patch-specific
build artifacts will be cached from the previous run, typically ~24h
prior.

One idea I have, for the "Warnings" task (and maybe linux too), is to
defer pruning until after all the compilations. To avoid LRU pruning
during early tasks causing bad hit ratios of later tasks.

Another thing that probably happens is that task1 starts compiling
patch1, and then another instance of task1 starts compiling patch2. A
bit later, the first instance will upload its ccache result for patch1,
which will be summarily overwritten by the second instance's compilation
result, which doesn't include anything from the first instance.

Also, whenever ccache hits its MAXSIZE threshold, it prunes the cache
down to 80% of the configured size, which probably wipes away everything
from all but the most recent ~20 builds.

I also thought about having separate caches for each compilation in the
warnings task - but that requires too much repeated yaml just for that..

> > From: Justin Pryzby <pryzbyj(at)telsasoft(dot)com>
> > Date: Sun, 3 Apr 2022 00:10:20 -0500
> > Subject: [PATCH 7/8] cirrus/ccache: disable compression and show stats
> >
> > linux/debian/bullseye has 4.2; 99MB cirrus cache; cache_size_kibibyte 109616
> > macos has 4.5.1: 47MB cirrus cache; cache_size_kibibyte 52500
> > freebsd has 3.7.12: 42MB cirrus cache; cache_size_kibibyte 134064
> > XXX windows has 4.7.2; 180MB cirrus cache; cache_size_kibibyte 51179
>
> I'm still somewhat doubtful this is a good idea. The mingw cache is huge, for
> example, and all that additional IO and memory usage is bound to show up.

I think you're referring to the proposed mingw task which runs under
windows, and not the existing cross-compilation ?

And you're right - I remember this now (I think it's due to PCH?)

In my local copy I'd "unset CCACHE_NOCOMPRESS". But I view that as an
oddity of windows headers, rather than an argument against disabling
compression elsewhere. BTW, freebsd ccache is too old to use
compression.

What about using CCACHE_HARDLINK (which implies no compression) ?

> > Note that ccache 4.4 supports CCACHE_STATSLOG, which seems ideal.
> > The log should be written *outside* the ccache folder - it shouldn't be
> > preserved across cirrusci task invocations.
>
> I assume we don't have a new enough ccache everywhere yet?

No - see above.

I've added patches to update macos.

--
Justin

Attachment Content-Type Size
0001-cirrus-windows-add-compiler_warnings_script.patch text/x-diff 2.5 KB
0002-cirrus-build-docs-as-a-separate-task.patch text/x-diff 2.3 KB
0003-cirrus-ccache-add-explicit-cache-keys.patch text/x-diff 1.5 KB
0004-cirrus-ccache-avoid-clearing-cache-until-after-all-c.patch text/x-diff 1.9 KB
0005-cirrus-ccache-disable-compression-and-show-stats.patch text/x-diff 3.6 KB
0006-cirrus-warnings-use-a-single-common-always-block.patch text/x-diff 3.4 KB
0007-cirrus-clean-up-windows-task.patch text/x-diff 2.2 KB
0008-cirrus-switch-to-macos_instance.patch text/x-diff 2.4 KB
0009-cirrus-update-to-macos-ventura.patch text/x-diff 933 bytes
0010-cirrus-freebsd-run-with-more-CPUs-RAM-and-do-not-rep.patch text/x-diff 1.9 KB
0011-cirrus-split-linux-and-move-the-only_if-line.patch text/x-diff 4.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2022-11-14 00:05:31 Re: Suppressing useless wakeups in walreceiver
Previous Message Thomas Munro 2022-11-13 23:51:14 Re: Suppressing useless wakeups in walreceiver