| From: | Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com> |
|---|---|
| To: | Andres Freund <andres(at)anarazel(dot)de> |
| Cc: | Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org> |
| Subject: | Re: Heads Up: cirrus-ci is shutting down June 1st |
| Date: | 2026-06-03 12:46:56 |
| Message-ID: | CAKZiRmy+8t7W7R9r_J1PMjUcsYqSkuSSMvLESu1GEQgT29zZCw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On Tue, Jun 2, 2026 at 8:38 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2026-06-01 12:01:58 +0200, Jakub Wartak wrote:
> > So I've spent half of day on trying to see what makes the tests so slow at
> > least in my case. I can also confirm %CPU combined (with high 33% sys).
>
> Was this locally on your machine? I assume that's without enabling
> sanitizers?
Yup.
> In CI the bottleneck clearly is CPU at the moment, due to the relatively now
> number of cores.
>
> To reduce IO, one pretty significant thing we can do is to reduce the segment
> size used during tests. Creating lots of 16MB segments when most of them are
> only very partially used isn't free.
Right, saw that, nice.
> > 0. baseline was ~71s (stuff already hot)
> > 1a. down to 64s with dirtywriteback tune (and mostly to avoid NVMe/SSD wear)
> > 1b. ~65s with tmpfs, so I've left using dirtywriteback sysctls:
> > sudo mount -t tmpfs -o size=4G,uid=XXX,mode=755 tmpfs build/tmp_install
> > sudo mount -t tmpfs -o size=16G,uid=XXX,mode=755 tmpfs /build/testrun
>
> I don't think we should do that, real FS behaviour is something we do IMO want
> to test.
Ack.
> > 1,100 pg_upgrade
> > 896 isolation
> > 694 pg_dump
> > 682 pg_basebackup
> >
> > Fixing above subscription to ~5000 conns did not gain much (well it saved
> > 5% of runtime 43s -> 41s). It's literally 10k lines of
> > s/$node_subscriber->safe_psql/sub_bg->query_safe/g across dozens of files
> > in src/test/subscription/t/). Too big for review and I'm not sharing as
> > it could contain errors.
>
> Did you test the effect of those changes on windows (via CI)? I'd expect that
> big a reduction to have a substantially bigger effect there.
No I did not and I've wiped the changes already, It was just probe for
any simple
quick wins...
> > 5. Spotted that we do plenty of initdb and cached-initdb (cp), so I had idea
> > about XFS's cp reflinks=always in build/, but I couldn't do that without
> > /dev/loop, so apparently XFS (reflink=1) vs ext4(reflink=0) halves number
> > of writes while even still on /dev/loop device, but that somehow
> > does not directly contribute to duration of the test (well we are
> > bottlenecked on CPU anyway, so this is just smarter? way of avoiding I/O;
> > maybe with cold-caches and on real VMs running with XFS would be faster)
> >
> > +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
> > @@ -687,7 +687,13 @@ sub init
> > }
> > else
> > {
> > - @copycmd = qw(cp -RPp);
> > + @copycmd = qw(cp --reflink=always -RPp);
>
> Afaict cp uses reflinks automatically by default, if the filesystem supports
> it. On CI it's not supported due to ext4, but locally it seems to work for
> me.
Yeah it does, I was just wanted to be double-sure, but then realized with CI
we are on overlay fs on top of host's ext4 :( It's a pitty because that cp could
be instant (even CREATE DATABASE with file_extend_method=clone) as even with
--wal-segsize=1 empty cluster takes ~32MB (3x8MB), but even rough estimates
of even cached initdb calls give huge numbers:
$ grep -r -A 5 'PostgreSQL::Test::Cluster->new' src contrib | grep -Po
'\->init[a-z_]*' | sort | uniq -c
341 ->init
98 ->init_from_backup
so that's like 400 * 32MB = 12800 MB? But I get the point of using real fs,
it's just that we should have some option of using throwaway filesystems
(maybe we even do, but on own/dedicated runners).
-J.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andres Freund | 2026-06-03 12:53:10 | Re: Heads Up: cirrus-ci is shutting down June 1st |
| Previous Message | Fujii Masao | 2026-06-03 12:03:39 | Re: Fix race in ReplicationSlotRelease for ephemeral slots |