Re: More efficient build farm animal wakeup?

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: More efficient build farm animal wakeup?
Date: 2022-11-20 03:56:20
Message-ID: CA+hUKGJtZ+mnZMCKWi=7PFaVS-C3o6BagBS6vK8ORhB9nKOL1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Nov 20, 2022 at 1:35 AM Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> tl,tr; it's not there now, but yes if we can find a smart way for th ebf clients to consume it, it is something we could build and deploy fairly easily.

Cool -- it sounds a lot like you've thought about this already :-)

About the client: currently run_branches.pl makes an HTTP request for
the "branches of interest" list. Seems like a candidate point for a
long poll? I don't think it'd have to be much smarter than it is
today, it'd just have to POST the commits it already has, I think.

Perhaps as a first step, the server could immediately report which
branches to bother fetching, considering the client's existing
commits. That'd almost always be none, but ~11.7 times per day a new
commit shows up, and once a year there's a new interesting branch.
That would avoid the need for the 6 git fetches that usually follow in
the common case, which admittedly might not be a change worth making
on its own. After all, the git fetches are probably quite similar
HTTP requests themselves, except that there 6 of them, one per branch,
and they hit the public git server instead of some hypothetical
buildfarm endpoint.

Then you could switch to long polling by letting the client say "if
currently none, I'm prepared to wait up to X seconds for a different
answer", assuming you know how to build the server side of that
(insert magic here). Of course, you can't make it too long or your
session might be dropped in the badlands between client and server,
but that's just a reason to make X configurable. I think RFC6202 says
that 120 seconds probably works fine across most kinds of links, which
means that you lower the total poll rate hitting the server, but--more
interestingly for me as a client--you minimise latency when something
finally happens. (With various keepalive tricks and/or heartbeat
streaming tricks you could possibly make it much higher, who knows...
but you'd have to set it very very low to do worse than what we're
doing today in total request count). Or maybe there is some existing
easy perl library that could be used for this (joke answer: cpan
install Twitter::API and follow @pg_commits).

By the way, the reason I wrote this is because I've just been
re-establishing my animal elver. It's set to run every minute by
cron, and spends nearly *half of each minute* running various git
commands when nothing is happening. Actually it's more than 6
connections to the server, because I see there's a fetch and an
ls-remote, so it's at least 12 (being unfamiliar with git plumbing, it
could be much more for all I know, and I kinda suspect so based on the
total run time). Admittedly network packets take a little while to
fly to my South Pacific location so maybe this looks less insane from
over there.

However, when I started this thread I was half expecting such a thing
to exist already, somewhere, I just haven't been able to find it
myself... Don't other people have this problem? Maybe everybody who
has this problem uses webhooks (git server post commit hook opens
connection to client) as you mentioned, but as you also mentioned
that'd never fly for our topology.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2022-11-20 03:59:47 Re: Support logical replication of DDLs
Previous Message Michael Paquier 2022-11-20 03:01:01 Re: Getting rid of SQLValueFunction