| From: | Andres Freund <andres(at)anarazel(dot)de> |
|---|---|
| To: | Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com> |
| Cc: | Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org> |
| Subject: | Re: Heads Up: cirrus-ci is shutting down June 1st |
| Date: | 2026-06-10 14:12:26 |
| Message-ID: | ypf7mxnlnxqql45unfklinvyi54wxzjnv6kenohljeaut5mcdq@p3gm43q2gzvn |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On 2026-06-10 13:13:49 +0200, Jakub Wartak wrote:
> On Tue, Jun 9, 2026 at 2:14 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On 2026-06-09 12:32:31 +0200, Jakub Wartak wrote:
> > > And I've bumped TEST_JOBS 4->8 (even with 4 VCPUs), because my local
> runs
> > > showed in taskmgr that after quite some time we have ended up using just
> > > ~40% CPU (also with 4 VCPUs) while not doing I/O (this is somehow
> contrary
> > > to what Andres was stating earlier).
> >
> > FWIW, I only measured this for linux, not for windows. On linux it was
> easy to
> > do
> >
> > + vmstat -y -n -w 1 > vmstat.log &
> > +
> > + meson test ${{env.MTEST_ARGS}} --num-processes
> ${{env.TEST_JOBS}} --no-suite setup ${{env.MTEST_TARGET}}
> > +
> > + killall iostat vmstat || true
> > +
> > + - name: Upload stats
> > + uses: actions/upload-artifact(at)v7
> > + with:
> > + path: |
> > + iostat.log
> > + vmstat.log
> >
> > Which showed that there is very little idle CPU other than during first
> few
> > seconds and at the end.
> >
> > I don't know how to do that on windows... I'm sure one can do it, with
> ETW or
> > such, but...
>
> Oh, I thought you guys there were have some secret keys to access GH(MS)
> stuff :)
I don't, and I doubt I could have. Nor would I even know where to ask :)
> OK, so I've gathered similiar data (using "typeperf", learned that the thing
> exists just today).
Nice.
> Attached are: patch how it was gathered, raw CSV data, and most importantly
> graph.
Looking at the raw data, I think something must not be quite right. Note how
low the absolute read/write IO numbers are. Is it possible that that's for the
C:/ disk, but that we're doing IO on D:/?
How exctly did you translate the csv data to %cpu utilization?
> We were both right and wrong. It is either CPU bottleneck, but also
> if the I/O is involved the CPU drops to <20% in case of runner #1 (same
> happens with runner#2 but for short time of 2 mins). Pretty much had
> similiar local Windows behavior.
Kinda looks like what we might want is to increase the times / amounts
equivalent to
/proc/sys/vm/{dirty_expire_centisecs,dirty_writeback_centisecs,dirty_background_ratio,dirty_ratio}
But due to the issue mentioned above, I'm not sure we can conclude that much
yet.
> IMHO *if* we want to push that faster it would make some sense to eliminate
> that I/O (but after observing that matrix split trich I'm not so sure if it
> is worth investing more into it). We seem to drop CPU use every time the
> avg disk queue len >= 2.
I'm not that concerned about the VS runtime right now, due to the split, but
mingw very frequently is the slowest task (with an empty / inapplicable cache
it's compilerwarnings, but I have some pending improvements for that, by
converting it to meson the worst case time halves). We can't just split all
tasks, that uses too many of the available "job slots".
> Or maybe offload that and ask GH folks to provide images with XFS and ReFS
> on D:\ by default instead ?
I suspect that will be a very heavy lift. That'd be a large change and there
are lot of users of this stuff.
It's probably worth seeing what the times with a newer windows image are,
before we do much more.
> Alvaro had an idea here in [1] about instance reusing.
We have the ability to run instances against a running cluster already, but
only use that in one place. I was wondering about a meson test "setup" that
will only run tests that can *not* be run against a running instance.
With a bit of additional scripting (we need the ability to set LD_LIBRARY_PATH
in a cross platform, we have that in a bunch of places, just need to expose
it), that'd allow us to convert all the meson based tests to use the running
tests, and all the tests that don't support that, without duplication between
the runs.
It's not really a fair comparison (due to what's running concurrently), but
here's the time for a few tests in running and a dedicated cluster:
70/398 postgresql:bloom / bloom/regress OK 1.97s 1 subtests passed
6/88 postgresql:bloom-running / bloom-running/regress OK 0.54s 1 subtests passed
68/398 postgresql:auto_explain / auto_explain/regress OK 1.96s 2 subtests passed
5/88 postgresql:auto_explain-running / auto_explain-running/regress OK 0.33s 2 subtests passed
77/398 postgresql:cube / cube/regress OK 2.27s 2 subtests passed
11/88 postgresql:cube-running / cube-running/regress OK 0.84s 2 subtests passed
Clearly we could gain some if we we didn't run the tests that supported
running against an existing cluster against separate clusters each.
Greetings,
Andres Freund
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tatsuo Ishii | 2026-06-10 14:19:02 | Re: Row pattern recognition |
| Previous Message | Marko Grujic | 2026-06-10 14:08:38 | Re: [PATCH v1] [BUG #19516] Skip whole-row projection shortcut for OLD/NEW returning type |