Re: Use windows VMs instead of windows containers on the CI

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Use windows VMs instead of windows containers on the CI
Date: 2023-01-11 00:12:46
Message-ID: CA+hUKGLRvp62DyrEAPmR51Bh5Q+fvZXoPn91oz4hfkcG=8p1Fw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 11, 2023 at 8:20 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2023-01-10 09:22:12 -0600, Justin Pryzby wrote:
> > > There is more than 2x speed gain when VMs are used.
> >
> > One consideration is that if windows runs twice as fast, we'll suddenly
> > start using twice as many resources at cirrus/google/amazon - the
> > windows task has been throttling everything else. Not sure if we should
> > to do anything beyond the limits that cfbot already uses.
>
> I'm not sure we would. cfbot has a time based limit for how often it tries to
> rebuild entries, and I think we were just about keeping up with that. In which
> case we shouldn't, on average, schedule more jobs than we currently
> do. Although peak "job throughput" would be higher.
>
> Thomas?

It currently tries to re-test each patch every 24 hours, but doesn't
achieve that. It looks like it's currently re-testing every ~30
hours. Justin's right, we'll consume more non-Windows resources if
Windows speeds up, but not 2x, more like 1.25x when cfbot's own
throttling kicks in. Or I could change the cycle target to 36 or 48
hours, to spread the work out more.

Back-of-a-napkin maths:

* there are currently 240 entries in a testable status
* it takes ~0.5 hours to test (because that's the slow Windows time)
* therefore it takes ~120 hours to test them all
* but we can do 4 at a time, so that's ~30 hours to get through them
all and start again
* that matches what we see:

cfbot=> select created - lag(created) over (order by created) from
branch where submission_id = 4068;
?column?
-----------------------

1 day 06:30:00.265047
1 day 05:43:59.978949
1 day 04:13:59.754048
1 day 05:28:59.811916
1 day 07:00:00.651655
(6 rows)

If, with this change, we can test in only ~0.25 hours, then we'll only
need 60 hours of Cirrus time to test them all. With a target of
re-testing every 24 hours, it should now only have to run ~2.5 jobs at
all times. Having free slots would be kind to Cirrus, and also lower
the latency when a new patch is posted (which currently has to wait
for a free slot before it can begin). Great news.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2023-01-11 00:39:30 Re: Fixing a couple of buglets in how VACUUM sets visibility map bits
Previous Message Jacob Champion 2023-01-11 00:08:18 Can we let extensions change their dumped catalog schemas?