Re: CI and test improvements

From: Andres Freund <andres(at)anarazel(dot)de>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Noah Misch <noah(at)leadboat(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>
Subject: Re: CI and test improvements
Date: 2022-08-28 16:07:52
Message-ID: 20220828160752.l5l66k3eptokzhzj@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-08-28 09:44:47 -0500, Justin Pryzby wrote:
> On Sat, May 28, 2022 at 10:37:41AM -0500, Justin Pryzby wrote:
> > I'm anticipating the need to further re-arrange the patch set - it's not clear
> > which patches should go first. Maybe some patches should be dropped in favour
> > of the meson project. I guess some patches will have to be re-implemented with
> > meson (msvc warnings).
>
> > Maybe the improvements to vcregress should go into v15 ? CI should run all the
> > tests (which also serves to document *how* to run all the tests, even if there
> > isn't a literal check-world target).
>
> On Thu, Jun 23, 2022 at 02:31:25PM -0500, Justin Pryzby wrote:
> > Should any of the test changes go into v15 ?
>
> On Thu, Jul 07, 2022 at 07:22:32PM -0500, Justin Pryzby wrote:
> > Also, cirrus/freebsd task can run 3x faster with more CPUs.
>
> Checking if there's interest in any/none of these patches ?
> I have added several more.
>
> Do you have an idea when the meson branch might be merged ?

I hope to do that fairly soon, but it's of course dependant on review etc. The
plan was to merge it early and mature it in tree to some degree. There's only
so much we can do "from the outside"...

> Will vcregress remain for a while, or will it go away for v16 ?

The plan was for the windows stuff to go away fairly quickly.

> From 99ee0bef5054539aad0e23a24dd9c9cc9ccee788 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj(at)telsasoft(dot)com>
> Date: Wed, 25 May 2022 21:53:22 -0500
> Subject: [PATCH 01/25] cirrus/windows: add compiler_warnings_script

Looks good.

> - MSBFLAGS: -m -verbosity:minimal "-consoleLoggerParameters:Summary;ForceNoAlign" /p:TrackFileAccess=false -nologo
> + # -fileLoggerParameters1: write warnings to msbuild.warn.log.
> + MSBFLAGS: -m -verbosity:minimal "-consoleLoggerParameters:Summary;ForceNoAlign" /p:TrackFileAccess=false -nologo -fileLoggerParameters1:warningsonly;logfile=msbuild.warn.log

Except, I think it'd be good to split this line. What do you think about using
something like
MSBFLAGS: >-
-nologo
-m -verbosity:minimal
/p:TrackFileAccess=false
"-consoleLoggerParameters:Summary;ForceNoAlign"
-fileLoggerParameters1:warningsonly;logfile=msbuild.warn.log

I think that should work?

> # If tests hang forever, cirrus eventually times out. In that case log
> # output etc is not uploaded, making the problem hard to debug. Of course
> @@ -450,6 +451,11 @@ task:
> cd src/tools/msvc
> %T_C% perl vcregress.pl ecpgcheck
>
> + # These should be last, so all the important checks are always run
> + always:
> + compiler_warnings_script:
> + - sh src\tools\ci\windows-compiler-warnings msbuild.warn.log
> +
> on_failure:
> <<: *on_failure
> crashlog_artifacts:
> diff --git a/src/tools/ci/windows-compiler-warnings b/src/tools/ci/windows-compiler-warnings
> new file mode 100755
> index 00000000000..d6f9a1fc569
> --- /dev/null
> +++ b/src/tools/ci/windows-compiler-warnings
> @@ -0,0 +1,16 @@
> +#! /bin/sh
> +# Success if the given file doesn't exist or is empty, else fail
> +# This is a separate file only to avoid dealing with windows shell quoting and escaping.
> +set -e
> +
> +fn=$1
> +
> +if [ -s "$fn" ]
> +then
> + # Display the file's content, then exit indicating failure
> + cat "$fn"
> + exit 1
> +else
> + # Success
> + exit 0
> +fi
> --
> 2.17.1

Wouldn't that be doable as something like
sh -c 'if test -s file; then cat file;exit 1; fi"
inside .cirrus.yml?

> From 1064a0794e85e06b3a0eca4ed3765f078795cb36 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj(at)telsasoft(dot)com>
> Date: Sun, 3 Apr 2022 00:10:20 -0500
> Subject: [PATCH 03/25] cirrus/ccache: disable compression and show stats
>
> ccache since 4.0 enables zstd compression by default.
>
> With default compression enabled (https://cirrus-ci.com/task/6692342840164352):
> linux has 4.2; 99MB cirrus cache; cache_size_kibibyte 109616
> macos has 4.5.1: 47MB cirrus cache; cache_size_kibibyte 52500
> freebsd has 3.7.12: 42MB cirrus cache; cache_size_kibibyte 134064
> windows has 4.6.1; 180MB cirrus cache; cache_size_kibibyte 51179
> todo: compiler warnings
>
> With compression disabled (https://cirrus-ci.com/task/4614182514458624):
> linux: 91MB cirrus cache; cache_size_kibibyte 316136
> macos: 41MB cirrus cache; cache_size_kibibyte 118068
> windows: 42MB cirrus cache; cache_size_kibibyte 134064
> freebsd is the same
>
> The stats should either be shown or logged (or maybe run with CCACHE_NOSTATS,
> to avoid re-uploading cache tarball in a 100% cached build, due only to
> updating ./**/stats).
>
> Note that ccache 4.4 supports CCACHE_STATSLOG, which seems ideal.

I stared at this commit message for a while, trying to make sense of it, and
couldn't really. I assume you're saying that the cirrus compression is better
with ccache compression disabled, but it's extremely hard to parse out of it.

This does too much at once. Show stats, change cache sizes, disable
compression.

> From 01e9abd386a4e6cc0125b97617fb42e695898cbf Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj(at)telsasoft(dot)com>
> Date: Tue, 26 Jul 2022 20:30:02 -0500
> Subject: [PATCH 04/25] cirrus/ccache: add explicit cache keys..
>
> Since otherwise, building with ci-os-only will probably fail to use the normal
> cache, since the cache key is computed using both the task name and its *index*
> in the list of caches (internal/executor/cache.go:184).

Hm, perhaps worth confirming and/or reporting to cirrus rather?

> From 8de5c977686270b0a4e666a924ebe820a245913a Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj(at)telsasoft(dot)com>
> Date: Sun, 24 Jul 2022 23:09:12 -0500
> Subject: [PATCH 05/25] silence make distprep and generated-headers
>
> this saves vertical screen space.
>
> https://www.postgresql.org/message-id/20220221164736.rq3ornzjdkmwk2wo@alap3.anarazel.de

I don't feel this should go in as a part of CI changes. Or rather, I feel
uncomfortable committing it when just discussed under this subject.

> From eaf263ccaa8310c5d9834b97e93ad8434d63296e Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj(at)telsasoft(dot)com>
> Date: Sun, 24 Jul 2022 22:44:53 -0500
> Subject: [PATCH 06/25] pg_regress: run more quietly
>
> The number of lines of output should be closer to 1 per test, rather than 25 +
> 1 per test.
>
> https://www.postgresql.org/message-id/flat/20220221173109.yl6dqqu3ud52ripd%40alap3.anarazel.de

See above. There's also a dedicated thread about revising the output.

> From 6a6a97fc869fd1fd8b7ab5da5147f145581634f9 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj(at)telsasoft(dot)com>
> Date: Fri, 24 Jun 2022 00:09:12 -0500
> Subject: [PATCH 08/25] cirrus/freebsd: run with more CPUs+RAM and do not
> repartitiion
>
> There was some historic problem where tests under freebsd took 8+ minutes (and
> before 4a288a37f took 15 minutes).
>
> This reduces test time from 10min to 3min.
> 4 CPUs 4 tests https://cirrus-ci.com/task/4880240739614720
> 4 CPUs 6 tests https://cirrus-ci.com/task/4664440120410112 https://cirrus-ci.com/task/4586784884523008
> 4 CPUs 8 tests https://cirrus-ci.com/task/5001995491737600
>
> 6 CPUs https://cirrus-ci.com/task/6678321684545536
> 8 CPUs https://cirrus-ci.com/task/6264854121021440
>
> See also:
> https://www.postgresql.org/message-id/flat/20220310033347(dot)hgxk4pyarzq4hxwp(at)alap3(dot)anarazel(dot)de#f36c0b17e33e31e7925e7e5812998686
> 8 jobs 7min https://cirrus-ci.com/task/6186376667332608
>
> xi-os-only: freebsd

Typo.

> @@ -71,8 +69,6 @@ task:
> fingerprint_key: ccache/freebsd
> reupload_on_changes: true
>
> - # Workaround around performance issues due to 32KB block size
> - repartition_script: src/tools/ci/gcp_freebsd_repartition.sh
> create_user_script: |
> pw useradd postgres
> chown -R postgres:postgres .
> --

What's the story there - at some point that was important for performance
because of the native block size triggering significant read-modify-write
cycles with postres' writes. You didn't comment on it in the commit message.

> From fd1c36a0bd8fa608ccdff5be3735dac5e3e48bf3 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj(at)telsasoft(dot)com>
> Date: Wed, 27 Jul 2022 16:54:47 -0500
> Subject: [PATCH 09/25] cirrus/freebsd: run build+check in a make vpath

> From 7052a32a21752b59632225684fc9426bb94e46e0 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj(at)telsasoft(dot)com>
> Date: Sun, 13 Feb 2022 17:56:40 -0600
> Subject: [PATCH 10/25] cirrus/windows: increase timeout to 25min

No explanation?

> From 602983b2cf37fc43465c62330b2e15e9d6d2035d Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj(at)telsasoft(dot)com>
> Date: Fri, 26 Aug 2022 12:00:10 -0500
> Subject: [PATCH 15/25] f!and chdir

I don't see the point of pointing fixup commits to the list.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2022-08-28 17:08:07 Re: [RFC] building postgres with meson - v12
Previous Message Peter Eisentraut 2022-08-28 15:55:15 Convert *GetDatum() and DatumGet*() macros to inline functions