Re: Heads Up: cirrus-ci is shutting down June 1st

From: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>
Subject: Re: Heads Up: cirrus-ci is shutting down June 1st
Date: 2026-05-28 17:06:22
Message-ID: CAN55FZ1-qiOWtQH5o6Q_7LJ7S3Ef_hfDE068uP0hGjB3gzwghg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thank you for looking into this!

On Wed, 27 May 2026 at 21:10, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> > Here is the v2, I took Jelte's patch and reviewed & merged it with my
> > patch. Updates and questions are:
> >
> > 1- I continued to use Jelte's container method (Linux tasks only for
> > now, BSD tasks will be included in the future) because I think that is
> > the future-proof way since we might want to generate our container
> > images in the future. Also, up-to-date Debian images can be tested
> > with this way; otherwise we would need to use Ubuntu 24.04.
>
> Good.
>
>
> > 2- io_uring tests work on the Linux Meson task.
>
> Is there a reason to not just do that for all the tasks?

I might word it incorrectly. I meant that Linux meson tests use:

PG_TEST_INITDB_EXTRA_OPTS: >-
-c io_method=io_uring

and that wasn't working before, now it works. I guess we have this
only on Linux because we wanted to test io_method=worker in the other
tasks.

> > 3- I didn't put commands to helper scripts for now. I think it is a
> > good thing to have a helper script but it would be better to have this
> > helper script after the first version is committed since it can extend
> > the timeline. Also, I found that having all commands in one file makes
> > debugging easier.
>
> Hm. I'm a bit worried about this getting pretty unmaintainable, due to the
> repetition. I think at least we need to use yaml anchors to deduplicate some
> steps.

Github Actions added support of yaml anchors last year but
unfortunately they don't support merge keys. Related information: [1].

> > 4- FreeBSD task has these options:
> >
> > PG_TEST_INITDB_EXTRA_OPTS: >-
> > -c debug_copy_parse_plan_trees=on
> > -c debug_write_read_parse_plan_trees=on
> > -c debug_raw_expression_coverage_test=on
> > -c debug_parallel_query=regress
> >
> > Since we won't have FreeBSD for the first version. I put these options
> > to the MacOS task but I couldn't decide where to put
> > 'PG_TEST_PG_UPGRADE_MODE: --link'.
>
> Makes sense.
>
>
> > Also, I am planning to work on back patches when we agree on the
> > upstream one. Does that sound good?
>
> Yep.
>
>
>
> > diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
> > new file mode 100644
> > index 00000000000..6d20068727c
> > --- /dev/null
> > +++ b/.github/workflows/ci.yml
> > @@ -0,0 +1,1125 @@
> > +# GitHub Actions CI configuration for PostgreSQL
> > +
> > +name: Github Actions CI
> > +
> > +on:
> > + push:
> > + branches: [ "*" ]
> > +
> > +# Default to the minimum privilege the jobs need (just reading the repo
> > +# contents during checkout). Individual jobs override this when they need
> > +# more, e.g. `cancel-previous` needs `actions: write` to cancel runs.
> > +permissions:
> > + contents: read
>
> I'm not sure I like that we ever need more than that. I'd expect that
> postgresql-cfbot will explicitly disable write permissions for runs.

Done. Updated the comment and removed the 'Cancel previous runs' step.

> > +# NB: intentionally NO workflow-level `concurrency:` block. The native
> > +# concurrency mechanism makes a new run wait for the previous one to fully
> > +# cancel before it starts — which can take a while. Instead the
> > +# `cancel-previous` job below fires a cancel API call asynchronously,
> > +# so the new run gets going immediately. On master the cancel job is skipped,
> > +# so every push runs to completion.
>
> Is this really worth having our own code? Seems like it'd not be that frequent
> to push if there are already running runs? What kind of delays are we talking
> about?

Jelte already answered this in [2]. 'Cancel previous runs' step is
removed and concurrency is used instead.

> > + # To avoid unnecessarily spinning up a lot of VMs / containers for entirely
> > + # broken commits, have a minimal task that all others depend on.
> > + #
> > + # SPECIAL:
> > + # - Builds with --auto-features=disabled and thus almost no enabled
> > + # dependencies
> > + sanity-check:
> > + name: SanityCheck
> > + needs: setup
> > + if: needs.setup.outputs.sanitycheck == 'true'
> > + runs-on: ubuntu-latest
> > + timeout-minutes: 15
> > + container:
> > + image: ${{ needs.setup.outputs.linux_ci_image }}
> > + env:
> > + BUILD_JOBS: 8
> > + TEST_JOBS: 8
> > + CCACHE_DIR: ${{ github.workspace }}/ccache_dir
> > + # no options enabled, should be small
> > + CCACHE_MAXSIZE: "150M"
> > + steps:
> > + - uses: actions/checkout(at)v6
> > + with:
> > + fetch-depth: ${{ env.CLONE_DEPTH }}
> > +
> > + - name: Restore ccache
> > + uses: actions/cache(at)v5
>
> Seems like this is used by every task. Can we move this into a yaml anchor or
> such, by using a variable representing the job name?

Github Actions doesn't support merge keys. So we can't really
duplicate them. I used yaml anchors for the checkout step since it is
exactly for all jobs.

> > + with:
> > + path: ${{ env.CCACHE_DIR }}
> > + key: ccache-sanitycheck-${{ github.run_id }}
> > + restore-keys: ccache-sanitycheck-
>
> Why is the key here the run id? Doesn't that mean that we will never have a
> precise cache match and that we will keep multiple versions of the cache
> around? That seems like a waste of cache space?
>
> For efficiency, particularly on cfbot, it seems like it could be useful to
> populate the cache of branches with the cache of the master branch. For that
> we'd need the branch name in the key. Which I think would also good for
> postgres/postgres, as we currently have a lot of interference between runs on
> the main and the REL_XY_STABLE branches.

I think that is the default way. If the cache has the exact hit, it
doesn't refresh the cache. So, having ${{ github.run_id }} makes sure
we won't have exact hits and the cache will always be refreshed. This
sounds bad but that is what I understood :(

I can implement something like this:

- name: Restore ccache
uses: actions/cache/restore(at)v5
with:
path: ${{ env.CCACHE_DIR }}
key: ccache-sanitycheck-master
restore-keys: |
ccache-sanitycheck-${{ github.ref_name }}
ccache-sanitycheck-

- name: Save ccache
if: always()
uses: actions/cache/save(at)v5
with:
path: ${{ env.CCACHE_DIR }}
key: ccache-sanitycheck-${{ github.ref_name }}-${{ github.run_id }}

So, it will first look for master's cache, then current branch's cache
and lastly whatever cache is available. Do you prefer that?

> > + - name: Prepare workspace
> > + run: |
> > + whoami
> > + useradd -m postgres
> > + chown -R postgres:postgres .
> > + mkdir -p "$CCACHE_DIR"
> > + chown -R postgres:postgres "$CCACHE_DIR"
> > + # Can't change the container's kernel.core_pattern; the postgres
> > + # user can't write to / normally. Make / writable.
> > + chown root:postgres /
> > + chmod g+rwx /
>
> Why not just always use a privileged container?

Done.

> > + - name: Configure
> > + run: |
> > + su postgres <<-'EOF'
> > + set -e
> > + meson setup \
> > + --buildtype=debug \
> > + --auto-features=disabled \
> > + -Ddefault_library=shared \
> > + -Dtap_tests=enabled \
> > + build
> > + EOF
> > +
> > + - name: Build
> > + run: |
> > + su postgres <<EOF
> > + set -e
> > + ninja -C build -j${BUILD_JOBS} ${MBUILD_TARGET}
> > + EOF
>
> Should we have an explicit cache upload step here? Or are upload steps run
> unconditionally?

Like I explained above, that is done by having ${{ github.run_id }} in
the cache key.

> > + # Run a minimal set of tests. The main regression tests take too long
> > + # for this purpose. For now this is a random quick pg_regress style
> > + # test, and a tap test that exercises both a frontend binary and the
> > + # backend.
> > + - name: Test
> > + run: |
> > + su postgres <<EOF
> > + set -e
> > + ulimit -c unlimited
> > + meson test ${MTEST_ARGS} --suite setup
> > + meson test ${MTEST_ARGS} --num-processes ${TEST_JOBS} \
> > + cube/regress pg_ctl/001_start_stop
> > + EOF
> > +
> > + - name: Core backtraces
> > + if: failure()
> > + run: |
> > + mkdir -m 770 /tmp/cores
> > + find / -maxdepth 1 -type f -name 'core*' -exec mv '{}' /tmp/cores/ \;
> > + src/tools/ci/cores_backtrace.sh linux /tmp/cores
> > +
> > + - name: Upload logs
> > + if: failure()
> > + uses: actions/upload-artifact(at)v7
> > + with:
> > + name: sanitycheck-logs-${{ github.run_id }}
> > + path: |
> > + build*/testrun/**/*.log
> > + build*/testrun/**/*.diffs
> > + build*/testrun/**/regress_log_*
> > + build*/meson-logs/*.txt
> > + if-no-files-found: ignore
>
> I think this really should be in a yaml anchor, we have a few somewhat
> different versions of this now.

Same thing, we can't have yaml anchors because merge keys are not
supported. I created this variable:

_LOG_PATHS: &log_paths |
build*/testrun/**/*.log
build*/testrun/**/*.diffs
build*/testrun/**/regress_log_*
build*/meson-logs/*.txt

and used it in the Upload logs' path.

> It's pretty annoying that the output of the failures isn't visible in the UI.
> Maybe we ought to print a few of the failures out or something?

We already have '--print-errorlogs', do you mean something different?

> > +
> > + # SPECIAL:
> > + # - Uses address sanitizer (sanitizer failures are typically printed in
> > + # the server log)
> > + # - Configures postgres with a small segment size
> > + #
> > + # Enable a reasonable set of sanitizers. Use the linux task for that, as
> > + # it's one of the fastest tasks (without sanitizers). Also several of the
> > + # sanitizers work best on linux.
> > + #
> > + # The overhead of alignment sanitizer is low, undefined behaviour has
> > + # moderate overhead. Test alignment sanitizer in the meson task, as it
> > + # does both 32 and 64 bit builds and is thus more likely to expose
> > + # alignment bugs.
> > + #
> > + # Address sanitizer in contrast is somewhat expensive. Enable it in the
> > + # autoconf task, as the meson task tests both 32 and 64bit.
>
> I wonder if we should split the meson task into two, one for 32bit and one for
> 64bit. The concurrency limits for public repos are high enough for that to
> seem like a reasonable tradeoff? There's no work, other than the repo
> checkout, shared between them.

Done.

> > + # disable_coredump=0, abort_on_error=1: for useful backtraces in case of crashes
> > + # print_stacktraces=1,verbosity=2, duh
> > + # detect_leaks=0: too many uninteresting leak errors in short-lived binaries
> > + linux-autoconf:
> > + name: Linux - Debian Trixie - Autoconf
> > + needs: [setup, sanity-check]
> > + if: |
> > + !cancelled() &&
> > + needs.setup.outputs.linux == 'true' &&
> > + needs.sanity-check.result != 'failure'
> > + runs-on: ubuntu-latest
> > + timeout-minutes: 60
> > + container:
> > + image: ${{ needs.setup.outputs.linux_ci_image }}
> > + # Share the host PID + IPC namespaces. 017_shm.pl rapidly creates,
> > + # kill9's, and restarts postgres; with the container's small PID
> > + # space a new postgres can recycle the dead postmaster's PID before
> > + # pg_ctl's postmaster.pid check notices, producing spurious "node X
> > + # is already running" failures. SysV shm in the test also relies on
> > + # host-like IPC behavior.
> > + #
> > + # --ulimit raises memlock and core dump size. Memlock is needed for
> > + # running the AIO tests.
> > + #
> > + # --privileged is needed so the prepare step can write to sysctls
> > + # under /proc/sys (it's mounted read-only without it). We use it to
> > + # set kernel.core_pattern.
> > + options: --pid=host --ipc=host --ulimit memlock=-1:-1 --privileged
> > + env:
> > + BUILD_JOBS: 4
> > + TEST_JOBS: 8
> > + CCACHE_DIR: /tmp/ccache_dir
> > + DEBUGINFOD_URLS: "https://debuginfod.debian.net"
> > +
> > + SANITIZER_FLAGS: -fsanitize=address
> > + UBSAN_OPTIONS: print_stacktrace=1:disable_coredump=0:abort_on_error=1:verbosity=2
> > + ASAN_OPTIONS: print_stacktrace=1:disable_coredump=0:abort_on_error=1:detect_leaks=0:detect_stack_use_after_return=0
> > + CFLAGS: -Og -ggdb -fno-sanitize-recover=all -fsanitize=address
> > + CXXFLAGS: -Og -ggdb -fno-sanitize-recover=all -fsanitize=address
> > + LDFLAGS: -fsanitize=address
> > + CC: ccache gcc
> > + CXX: ccache g++
>
> There's a fair bit of stuff shared between the meson/autoconf linux
> tasks. Previously they used a matrix to reduce that a *bit*. But now it's
> entirely duplicated, including stuff that doesn't apply to the current job
> (e.g. UBSAN_OPTIONS/ASAN_OPTIONS). And blocks like the following:
>
>
> > + - name: Prepare workspace
> > + run: |
> > + useradd -m postgres
> > + chown -R postgres:postgres .
> > + mkdir -p "$CCACHE_DIR"
> > + chown -R postgres:postgres "$CCACHE_DIR"
> > + mkdir -m 770 /tmp/cores
> > + chown root:postgres /tmp/cores
> > + sysctl kernel.core_pattern='/tmp/cores/%e-%s-%p.core'
> > +
> > + # Hosts for the load balance test
> > + cat >> /etc/hosts <<-EOF
> > + 127.0.0.1 pg-loadbalancetest
> > + 127.0.0.2 pg-loadbalancetest
> > + 127.0.0.3 pg-loadbalancetest
> > + EOF

I found we can use matrices and merged all linux tasks. I am not sure
that is better since it is a bit harder to read now.

> > + # Install dependencies via Homebrew rather than Macports. On stock
> > + # GH runners macports requires a heavy bootstrap, and the relevant
> > + # Postgres deps are all available in brew.
>
> What does "heavy bootstrap" mean?

I used MacPorts on my first version. It took ~10 minutes to download
MacPorts. I think that if we could use caching like we did in the
Cirrus, it makes sense to use MacPorts. I will spend some time on
that.

And after spending some time, I am able to make it work. Now the first
run's dependencies install takes ~10 minutes since there is no
MacPorts cache but subsequent runs' install only take ~5 seconds.

> > + - name: Install dependencies
> > + run: |
> > + brew update
> > + brew install \
> > + ccache meson openldap python(at)3(dot)12 tcl-tk
> > + # IPC::Run via cpanm (system perl)
> > + sudo cpan -T -i IPC::Run IO::Tty
>
> We do spend ~95s on this every run, that's not nothing. And it puts a bunch of
> load onto the brew's mirrors to do that every run.

You are right. MacPorts is used now.

> > + - name: Test world
> > + run: |
> > + ulimit -c unlimited
> > + ulimit -n 1024
> > + meson test ${MTEST_ARGS} --num-processes ${TEST_JOBS}
>
> I'd re-add the comments that were in .cirrus.yml about this.

Done.

> > + windows-vs:
> > + name: Windows - Server 2022, VS 2022 - Meson & ninja
> > + needs: [setup, sanity-check]
> > + if: |
> > + !cancelled() &&
> > + needs.setup.outputs.windows == 'true' &&
> > + needs.sanity-check.result != 'failure'
> > + runs-on: windows-2022
> > + timeout-minutes: 60
> > + env:
> > + TEST_JOBS: 8
> > + # Avoid port conflicts between concurrent tap tests
> > + PG_TEST_USE_UNIX_SOCKETS: 1
> > + PG_REGRESS_SOCK_DIR: 'c:\pgsock\'
>
> At least my editor gets confused by the \', thinking it's escaping the '. As
> everything just works without the trailing \, I'd go that way.

Done.

> > + # The TAP tests build an initdb template under build/tmp_install and
> > + # then `robocopy` it into per-test data directories. Robocopy with the
> > + # default /COPY:DAT flag doesn't copy ACLs — destinations inherit from
> > + # their parent dir. On GitHub-hosted Windows runners the workspace's
> > + # inherited ACL grants Administrators:(F) and Users:(RX) but does NOT
> > + # grant the runner user (runneradmin) directly. That matters because
> > + # pg_ctl on Windows uses CreateRestrictedProcess to drop admin
> > + # privileges from postmaster, so the postmaster process has the user
> > + # SID in its token but no longer the Administrators group — leaving it
> > + # with only "Users:(RX)" on pg_control and friends, which causes
> > + # "PANIC: could not open file global/pg_control: Permission denied".
> > + #
> > + # Fix it once on the workspace dir with (OI)(CI) inheritance flags so
> > + # every file/dir created underneath gets an explicit grant for the
> > + # current user.
> > + - name: Grant workspace ACL to runner user
> > + shell: pwsh
> > + run: |
> > + icacls "${{ github.workspace }}" /grant "${env:USERNAME}:(OI)(CI)F" /Q | Out-Null
> > + Write-Host "Granted Full Control to $env:USERNAME on ${{ github.workspace }}"
>
> Perhaps this would be better to fix by changing the robocopy flags?

I couldn't fix this by using robocopy flags. I used /COPYALL and
/SECFIX together but they didn't work.

> > + # postgres' plpython3u loads python3.dll (the stable-ABI forwarder)
> > + # which in turn loads whichever python3NN.dll the Windows loader finds
> > + # first on PATH. On windows-2022 `C:\Program Files\Mercurial\` ships
> > + # its own python3.dll + python39.dll and appears on PATH *before* the
> > + # hostedtoolcache Python 3.12 — so without intervention the backend
> > + # ends up running Python 3.9 while postgres' stdlib search uses 3.12,
> > + # producing `ImportError: cannot import name 'text_encoding' from
> > + # 'io'` (the 3.12 `io.py` calling into 3.9's `_io`).
> > + #
> > + # Pin PYTHONHOME to the Python 3.12 prefix, and prepend that prefix
> > + # to PATH so its python3.dll wins the DLL search.
> > + - name: Pin Python prefix on PATH and PYTHONHOME
> > + shell: pwsh
> > + run: |
> > + $prefix = (python -c "import sys; print(sys.prefix)").Trim()
> > + Add-Content $env:GITHUB_ENV "PYTHONHOME=$prefix"
> > + Add-Content $env:GITHUB_PATH $prefix
> > + Write-Host "PYTHONHOME=$prefix"
> > + Write-Host "Prepended $prefix to PATH"
>
> GRJGJKLJKJDFJKDF.

I re-checked this since Jelte wasn't completely sure about this [2]
but this is unfortunately correct :(

> > + - name: Install dependencies
> > + shell: pwsh
> > + run: |
> > + choco install -y --no-progress --limitoutput diffutils winflexbison
> > + # meson + ninja aren't preinstalled on windows-2022. Install via pip
> > + python -m pip install --upgrade meson ninja
> > +
> > + # OpenSSL 1.1 via the slproweb installer (pinned to match the
> > + # version used elsewhere in postgres CI).
> > + curl.exe -fsSL -o openssl-setup.exe https://slproweb.com/download/Win64OpenSSL-1_1_1w.exe
> > + Start-Process -Wait -FilePath ./openssl-setup.exe `
> > + -ArgumentList '/DIR=c:\openssl\1.1\ /VERYSILENT /SP- /SUPPRESSMSGBOXES'
> > + # The slproweb installer puts libcrypto-1_1-x64.dll / libssl-1_1-x64.dll
> > + # in c:\openssl\1.1\bin\ and updates the system PATH. GH Actions
> > + # snapshots PATH at job start though, so the running job won't
> > + # see those DLLs and initdb.exe would crash silently at runtime.
> > + # Push the bin dir onto GITHUB_PATH so it persists for later steps.
> > + Add-Content $env:GITHUB_PATH "c:\openssl\1.1\bin"
>
> I don't like that much, but I'm not sure we have a better alternative
> short-term.

Making chocolatey would be a nice alternative. You already said
sometimes chocolatey takes too much time. I am planning to spend time
on it unless we are planning to use our own Windows containers.

> > + windows-mingw:
> > + name: Windows - Server 2022, MinGW64 - Meson
> > + needs: [setup, sanity-check]
> > + if: |
> > + !cancelled() &&
> > + needs.setup.outputs.mingw == 'true' &&
> > + needs.sanity-check.result != 'failure'
> > + runs-on: windows-2022
> > + timeout-minutes: 60
> > + env:
> > + TEST_JOBS: 4 # higher concurrency causes occasional failures
> > + PG_TEST_USE_UNIX_SOCKETS: 1
> > + PG_REGRESS_SOCK_DIR: 'c:\pgsock\'
> > + TAR: "c:/windows/system32/tar.exe"
> > + # for mingw plpython to find its installation
> > + PYTHONHOME: D:/a/_temp/msys64/ucrt64
> > +
> > + MSYS: winjitdebug
> > + CHERE_INVOKING: 1
> > + MESON_FEATURES: >-
> > + -Dnls=disabled
>
> Missing comments from .cirrus.tasks.yml

Done.

v3 is attached. Just a quick note, v3 includes Zsolt [3] And Peter's
[4] reviews & feedback too. I will reply to them after sending this.

GA run after v3 is applied:
https://github.com/nbyavuz/postgres/actions/runs/26587973538

[1]
https://github.com/actions/runner/issues/1182
https://github.com/orgs/community/discussions/185877
[2] https://postgr.es/m/CAGECzQQBCF%3DHSk4eCc1fEYTpCt59rgpcwWp47%2B6M-CDMYEaM2A%40mail.gmail.com
[3] https://postgr.es/m/CAN4CZFO4usEzFQoYzEywvOgoagW%3DU4yhpB4Oq-a7bUCR53djHA%40mail.gmail.com
[4] https://postgr.es/m/3daa29a4-6a08-41c1-8a6a-53ba8cd3c7fb%40eisentraut.org

--
Regards,
Nazir Bilal Yavuz
Microsoft

Attachment Content-Type Size
v3-0001-Add-GitHub-Actions-yaml-file.patch text/x-patch 38.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nazir Bilal Yavuz 2026-05-28 17:07:43 Re: Heads Up: cirrus-ci is shutting down June 1st
Previous Message Jacob Champion 2026-05-28 17:04:51 Re: Heads Up: cirrus-ci is shutting down June 1st