Re: buildfarm animals and 'snapshot too old'

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: buildfarm animals and 'snapshot too old'
Date: 2014-05-20 13:35:59
Message-ID: 537B5A3F.5010808@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 05/20/2014 07:09 AM, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Mon, May 19, 2014 at 7:58 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>>> Well, the original code was put in for a reason, presumably that we were
>>> getting some stale data and wanted to exclude it. So I'm unwilling to throw
>>> it out altogether. If someone can propose a reasonable sanity check then I'm
>>> prepared to implement it.
>> While I generally agree that long-established code shouldn't be
>> changed for light or transient causes, I have to admit I'm pretty
>> skeptical about this particular instance. I can't think of any
>> particularly compelling reason why it's BAD for an old result to show
>> up. We now show the commit ID on the main page, so if you see 512abc4
>> in the middle of a bunch of ef9ab5f's, you'll notice. And if you
>> don't notice, so what?
> Robert's got a point here. In my usage, the annoying thing is not animals
> that take a long time to report in; it's the ones that lie about the
> snapshot time (which is how you get "512abc4 in the middle of a bunch of
> ef9ab5f's"). That is an issue of incorrect system clock, not of how long
> it takes to do the run. I wonder if the buildfarm script could be taught
> to get the timestamp from an NTP server somewhere? Or at least
> sanity-check the system clock reading by comparing it to the newest commit
> timestamp in the git repo.

Showing the commit id is a relatively recent phenomenon, dating back to
July 2013. I agree with Robert that it might obsolete this check, so I
have disabled it for now. I have also disabled the other timestamp
check, on the time the client actually took the snapshot (as opposed to
the time of the last commit in the snapshot) for the
CLOBBER_CACHE_RECURSIVELY case.

Regarding clock skew, I think we can do better then what you suggest.
The web transaction code in the client adds its own timestamp just
before running the web transaction. It would be quite reasonable to
reject reports from machines with skewed clocks based on this value. I'm
not sure what a reasonable skew might be. Somewhere in the range of 5 to
15 minutes seems reasonable.

cheers

andrew

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-05-20 13:42:15 Re: buildfarm animals and 'snapshot too old'
Previous Message Andres Freund 2014-05-20 12:22:24 Re: buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)