[RFC] building postgres with meson

From: Andres Freund
Subject: [RFC] building postgres with meson
Date: 2021-10-12
Lists: pgsql-hackers


For the last year or so I've on and off tinkered with $subject. I think
it's in a state worth sharing now. First, let's look at a little

My workstation:

non-cached configure:
current: 11.80s
meson: 6.67s

non-cached build (world-bin):
current: 40.46s
ninja: 7.31s

no-change build:
current: 1.17s
ninja: 0.06s

test world:
current: 105s
meson: 63s

What actually started to motivate me however were the long times windows
builds took to come back with testsresults. On CI, with the same machine

current: 202s (doesn't include genbki etc)
meson+ninja: 140s
meson+msbuild: 206s

current: 1323s (many commands)
meson: 903s (single command)

(note that the test comparison isn't quite fair - there's a few tests
missing, but it's just small contrib ones afaik)

The biggest difference to me however is not the speed, but how readable
the output is.

Running the tests with meson in a terminal, shows the number of tests
that completed out of how many total, how much time has passed, how long
the currently running tests already have been running.

At the end of a testrun a count of tests is shown:

188/189 postgresql:tap+pg_basebackup / pg_basebackup/t/010_pg_basebackup.pl OK 39.51s 110 subtests passed
189/189 postgresql:isolation+snapshot_too_old / snapshot_too_old/isolation OK 62.93s

Ok: 188
Expected Fail: 0
Fail: 1
Unexpected Pass: 0
Skipped: 0
Timeout: 0

Full log written to /tmp/meson/meson-logs/testlog.txt

The log has the output of the tests and ends with:

Summary of Failures:
120/189 postgresql:tap+recovery / recovery/t/007_sync_rep.pl ERROR 7.16s (exit status 255 or signal 127 SIGinvalid)

Quite the difference to make check-world -jnn output.

So, now that the teasing is done, let me explain a bit what lead me down
this path:

Autoconf + make is not being actively developed. Especially autoconf is
*barely* in maintenance mode - despite many shortcomings and bugs. It's
also technology that very few want to use - autoconf m4 is scary, and
it's scarier for people that started more recently than a lot of us
committers for example.

Recursive make as we use it is hard to get right. One reason the clean
make build is so slow compared to meson is that we had to resort to
.NOTPARALLEL to handle dependencies in a bunch of places. And despite
that, I quite regularly see incremental build failures that can be
resolved by retrying the build.

While we have incremental build via --enable-depend, they don't work
that reliable (i.e. misses necessary rebuilds) and yet is often too
aggressive. More modern build system can keep track of the precise
command used to build a target and rebuild it when that command changes.

We also don't just have the autoconf / make buildsystem, there's also
the msvc project generator - something most of us unix-y folks do not
like to touch. I think that, combined with there being no easy way to
run all tests, and it being just different, really hurt our windows
developer appeal (and subsequently the quality of postgres on
windows). I'm not saying this to ding the project generator - that was
well before there were decent "meta" buildsystems out there (and in some
ways it is a small one itself).

The last big issue I have with the current situation is that there's no
good test integration. make check-world output is essentially unreadable
/ not automatically parseable. Which led to the buildfarm having a
separate list of things it needs to test, so that failures can be
pinpointed and paired with appropriate logs. That approach unfortunately
doesn't scale well to multi-core CPUs, slowing down the buildfarm by a
fair bit.

This all led to me to experiment with improvements. I tried a few
somewhat crazy but incremental things like converting our buildsystem to
non-recursive make (I got it to build the backend, but it's too hard to
do manually I think), or to not run tests during the recursive make
check-world, but to append commands to a list of tests, that then is run
by a helper (can kinda be made to work). In the end I concluded that
the amount of time we'd need to invest to maintain our more-and-more
custom buildsystem going forward doesn't make sense.

Which lead me to look around and analyze which other buildsystems there
are that could make some sense for us. The halfway decent list includes,
I think:
1) cmake
2) bazel
3) meson

cmake would be a decent choice, I think. However, I just can't fully
warm up to it. Something about it just doesn't quite sit right with
me. That's not a good enough reason to prevent others from suggesting to
use it, but it's good enough to justify not investing a lot of time in
it myself.

Bazel has some nice architectural properties. But it requires a JVM to
run - I think that basically makes it insuitable for us. And the build
information seems quite arduous to maintain too.

Which left me with meson. It is a meta-buildsystem that can do the
actual work of building via ninja (the most common one, also targeted by
cmake), msbuild (visual studio project files, important for GUI work)
and xcode projects (I assume that's for a macos IDE, but I haven't tried
to use it). Meson roughly does what autoconf+automake did, in a
python-esque DSL, and outputs build-instructions for ninja / msbuild /
xcode. One interesting bit is that meson itself is written in python (
and fairly easy to contribute too - I got a few changes in now).

I don't think meson is perfect architecturally - e.g. its insistence on
not having functions ends up making it a bit harder to not end up
duplicating code. There's some user-interface oddities that are now hard
to fix fully, due to the faily wide usage. But all-in-all it's pretty
nice to use.

Its worth calling out that a lot of large open source projects have been
/ are migrating to meson. qemu/kvm, mesa (core part of graphics stack on
linux and also widely used in other platforms), a good chunk of GNOME,
and quite a few more. Due to that it seems unlikely to be abandoned

As far as I can tell the only OS that postgres currently supports that
meson doesn't support is HPUX. It'd likely be fairly easy to add
gcc-on-hpux support, a chunk more to add support for the proprietary

The attached patch (meson support is 0016, the rest is prerequisites
that aren't that interesting at this stage) converts most of postgres to
meson. There's a few missing contrib modules, only about half the
optional library dependencies are implemented, and I've only built on
x64. It builds on freebsd, linux, macos and windows (both ninja and
msbuild) and cross builds from linux to windows. Thomas helped make the
freebsd / macos pieces a reality, thanks!

I took a number of shortcuts (although there used to be a *lot*
more). So this shouldn't be reviewed to the normal standard of the
community - it's a prototype. But I think it's in a complete enough
shape that it allows to do a well-informed evaluation.

What doesn't yet work/ build:

- plenty optional libraries, contrib, NLS, docs build

- PGXS - and I don't yet know what to best do about it. One
backward-compatible way would be to continue use makefiles for pgxs,
but do the necessary replacement of Makefile.global.in via meson (and
not use that for postgres' own build). But that doesn't really
provide a nicer path for building postgres extensions on windows, so
it'd definitely not be a long-term path.

- JIT bitcode generation for anything but src/backend.

- anything but modern-ish x86. That's proably a small amount of work,
but something that needs to be done.

- exporting all symbols for extension modules on windows (the stuff for
postgres is implemented). Instead I marked the relevant symbols als
declspec(dllexport). I think we should do that regardless of the
buildsystem change. Restricting symbol visibility via gcc's
-fvisibility=hidden for extensions results in a substantially reduced
number of exported symbols, and even reduces object size (and I think
improves the code too). I'll send an email about that separately.

There's a lot more stuff to talk about, but I'll stop with a small bit
of instructions below:

Demo / instructions:
# Get code
git remote add andres git(at)github(dot)com:anarazel/postgres.git
git fetch andres
git checkout --track andres/meson

# setup build directory
meson setup build --buildtype debug
cd build

# build (uses automatically as many cores as available)

# change configuration, build again
meson configure -Dssl=openssl

# run all tests
meson test

# run just recovery tests
meson test --suite setup --suite recovery

# list tests
meson test --list


Andres Freund

