Re: Adding CI to our tree

From: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Adding CI to our tree
Date: 2021-12-17 11:34:36
Message-ID: 03b72c7f-a324-55cd-679f-e9b0543d7f25@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 13.12.21 22:12, Andres Freund wrote:
> Attached is an updated version of the CI patches. An example of a test run
> with the attached version of this
> https://cirrus-ci.com/build/6501998521483264

+ only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' ||
$CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*freebsd.*'

I'm not in favor of this kind of thing. I don't understand how this is
useful, other than perhaps for developing *this* patch. I don't think
people would like having these tags in the mainline, and if it's for
local use, then people can adjust the file locally.

+ CC="ccache cc" CFLAGS="-O0 -ggdb"'

I don't think using -O0 is the right thing. It will miss some compiler
warnings, and it will not thoroughly test the compiler. We should test
using the configurations that are close to what users actually use.

+ - su postgres -c 'gmake -s -j3 && gmake -s -j3 -C contrib'

Why doesn't this use make world (or world-bin, if you prefer).

Why does this use -j3 if there are two CPUs configured? (Perhaps the
number of CPUs should be put into a variable.)

I don't like that the -s option is used. I would like to see what
commands are executed.

+ cpu: 4

Why does the Linux job use 4 CPUs and the FreeBSD job 2?

+ - export

I don't think that is portable to all shells.

+ - su postgres -c 'time script test.log gmake -s -j2 ${CHECK}
${CHECKFLAGS}'

+ su postgres -c '\
+ ulimit -c unlimited; \
+ make -s ${CHECK} ${CHECKFLAGS} -j8 \
+ '

Not clear why these are so different. Don't we need the test.log file
for Linux? Don't we need the ulimit call for FreeBSD? Why the -j8
option even though 4 CPUs have been configured?

+ brew install \
+ ccache \
+ coreutils \
+ icu4c \
+ krb5 \
+ llvm \
+ lz4 \
+ make \
+ openldap \
+ openssl \
+ python(at)3(dot)10 \
+ tcl-tk

Curious why coreutils and make are installed? The system-supplied tools
ought to work.

+ brew cleanup -s

Seems unnecessary?

+ PKG_CONFIG_PATH="/usr/local/opt/krb5/lib/pkgconfig:$PKG_CONFIG_PATH"

AFAICT, we don't use pkg-config for the krb5 package.

+ - gmake -s -j12 && gmake -s -j12 -C contrib

These numbers should also be explained or configured somewhere.
Possibly query the number of CPUs on the instance.

+ PROVE_FLAGS: -j10

Why only on Windows?

+ # Installation on windows currently only completely works from
src\tools\msvc

If that is so, let's fix that. But see that install.pl contains

if (-e "src/tools/msvc/buildenv.pl")

etc. it seems to want to be able to be invoked from the top level.

+ - cd src\tools\msvc && perl .\install.pl
%CIRRUS_WORKING_DIR%\tmp_install

Confusing mix of forward and backward slashes in the Windows section. I
think forward slashes should work everywhere.

+ test_plcheck_script:
+ - perl src/tools/msvc/vcregress.pl plcheck

etc. Couldn't we enhance vcregress.pl to take multiple arguments or take
a "check-world" argument or something. Otherwise, this will be tedious
to keep updated.

+ test_subscriptioncheck_script:
+ - perl src/tools/msvc/vcregress.pl taptest .\src\test\subscription\

This is even worse. I don't want to have to hand-register every new TAP
test.

+ always:
+ gcc_warning_script: |
+ time ./configure \
+ --cache gcc.cache CC="ccache gcc" \
+ --enable-dtrace

I don't know why we wouldn't need the full set of options here. It's
not like optional code never has compiler warnings.

+ # cross-compile to windows
+ always:
+ mingw_cross_warning_script: |

I would welcome a native mingw build with full options and test suite
run. This cross-compiling stuff is of course interesting, but I'm not
sure why it is being preferred over a full native run.

--- /dev/null
+++ b/.dockerignore
@@ -0,0 +1,7 @@
+# Ignore everything, except ci/

I wonder whether this would interfere with other uses of docker. I
suppose people might have their custom setups for building docker images
from PostgreSQL sources. It seems weird that this file gets this
prominence, saying that the canonical use of docker inside PostgreSQL
sources is for Cirrus CI.

It would be useful if the README explained the use of docker. As I
mentioned before, it's not immediately clear why docker is used at all
in this.

The docker file for Windows contains a lot of hardcoded version numbers.
This should at least be refactored a little bit so that it is clear
which version numbers should be updated and how. Or better yet, avoid
the need to constantly update version numbers. For example, the Python
patch release changes every few weeks (e.g., 3.10.0 isn't current
anymore). Also, the way OpenSSL is installed looks a bit fishy. Is
this what people actually use in practice? How can we make it match
actual practice better?

+# So that tests using the "manually" started postgres on windows can use
+# prepared statements
+max_prepared_transactions = 10

Maybe add that to the pg_ctl invocation in the Windows section instead.

+# Settings that make logs more useful
+log_autovacuum_min_duration = 0
+log_checkpoints = true
+log_connections = true
+log_disconnections = true
+log_line_prefix = '%m [%p][%b] %q[%a][%v:%x] '
+log_lock_waits = true

If we think these are useful, we should make the test suite drivers set
them for all users.

> One I explicitly brought up before:
>
> On 2021-10-01 15:27:52 -0700, Andres Freund wrote:
>> One policy discussion that we'd have to have is who should control the images
>> used for CI. Right now that's on my personal google cloud account - which I am
>> happy to do, but medium - long term that'd not be optimal.
>
> The proposed CI script uses custom images to run linux and freebsd tests. They
> are automatically built every day from the repository https://github.com/anarazel/pg-vm-images/
>
> These images have all the prerequisites pre-installed. For Linux something
> similar can be achieved by using dockerfiles referenced the .cirrus.yml file,
> but for FreeBSD that's not available. Installing the necessary dependencies on
> every run is too time intensive. For linux, the tests run a lot faster in
> full-blown VMs than in docker, and full VMs allow a lot more control /
> debugging.
>
> I think this may be OK for now, but I also could see arguments for wanting to
> transfer the image specifications and the google account to PG properties.

For the above reasons of lack of documentation, I still don't understand
the whole docker flow here. Do you mean, the docker files included in
your patch are not actually used as part of the CI run; instead you use
them to build images manually, which are then pulled in by the test runs?

If so, apart from the general problem of having this go through some
personal account, I also have concerns how this can be kept up to date,
given how often the dependent software changes, as mentioned above.

I think it would be much easier to get this project over the initial
hump if we skipped the whole docker business and just set the images up
from scratch on each run.

> The second attention-worthy point is the choice of a new toplevel ci/
> directory as the location for resources referencenced by CI. A few other
> projects also use ci/, but I can also see arguments for moving the contents to
> e.g. src/tools/ci or such?

Or src/tools/cirrus/? This providers came and go, and before long there
might be interest in another one.

> - Extend the set of compiler warnings - as the compiler version is controlled,
> we could be more aggressive than we can be via configure.ac.

Not sure about that. I don't want this to evolve into some separate
pool of policies that yells at you because of some settings that you
never heard of. If we think other warnings are useful, we should
provide a way to select them, perhaps optionally, from the usual build
system.

> - consider enable compile-time debugging options like COPY_PARSE_PLAN_TREES,
> and run-time force_parallel_mode = regress on some platforms. They seem to
> catch a lot of problems during development and are likely affordable enough.

That would be useful if we can think of a way to select it optionally.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Nancarrow 2021-12-17 11:59:28 Re: row filtering for logical replication
Previous Message Masahiko Sawada 2021-12-17 11:13:11 Re: Skipping logical replication transactions on subscriber side