Re: Buildfarm alarms

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: pgbuildfarm-members(at)pgfoundry(dot)org
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Buildfarm alarms
Date: 2006-09-27 13:56:10
Message-ID: 451A82FA.4060708@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: buildfarm-members pgsql-hackers

I wrote:
> Tom Lane wrote:
>
>> "Andrew Dunstan" <andrew(at)dunslane(dot)net> writes:
>>
>>> It could certainly be done. In general, I have generally taken the view
>>> that owners have the responsibility for monitoring their own machines.
>>>
>> Sure, but providing them tools to do that seems within buildfarm's
>> purview.
>>
>> For some types of failure, the buildfarm script could make a local
>> notification without bothering the server --- but a timeout on the
>> server side would cover a wider variety of failures, including "this
>> machine is dead and ought to be removed from the farm".
>>
>>
>
> Nothing gets removed. If a machine does not report on a branch for 30 days
> it drops off the dashboard, but apart from that it is a retained historic
> aretfact. This buildup in history has been gradually slowing down the
> dashboard, in fact, but Ian Barwick tells me that he has rewritten my
> lousy SQL to make it fast again, so we'll soon get that working better.
>
> Anyway, I think we can do something fairly simply for these alarms. We'll
> just have a special stanza in the config file, and a cron job that checks,
> say, once a day, to see if we have exceeded the alarm period on any
> machine/branch combination.
>
>

OK, I have a gadget to do this in place.

It looks at the config of the last build registered on each branch for a
stanza called 'alerts' that would look like this:

alerts => {
HEAD => { alert_after => 24, alert_every => 48 },
REL8_1_STABLE => { alert_after => 168, alert_every => 48 },
}

The settings are in hours, so this says that if we haven't seen a HEAD
build in 1 day or a stable branch build in 1 week, alert the owner by
email, and keep repeating the alert in each case every 2 days.

If some intrepid buildfarm owner wants to test this out by using low
settings that would trigger an alert that would be good - the cron job
runs every hour.

cheers

andrew

In response to

Responses

Browse buildfarm-members by date

  From Date Subject
Next Message Dave Page 2006-09-27 14:10:06 Re: Buildfarm alarms
Previous Message Joachim Wieland 2006-09-26 09:26:38 Re: Buildfarm alarms

Browse pgsql-hackers by date

  From Date Subject
Next Message Strong, David 2006-09-27 14:08:05 Re: Faster StrNCpy
Previous Message Tom Lane 2006-09-27 13:51:37 Re: horo(r)logy test fail on solaris (again and solved)