Re: buildfarm animals and 'snapshot too old'

From: "Tomas Vondra" <tv(at)fuzzy(dot)cz>
To: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
Cc: "Tomas Vondra" <tv(at)fuzzy(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: buildfarm animals and 'snapshot too old'
Date: 2014-05-15 19:57:45
Message-ID: d1eacb16c245a0a4818b627461f3eab6.squirrel@sq.gransy.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 15 Květen 2014, 19:46, Andrew Dunstan wrote:
>
> On 05/15/2014 12:43 PM, Tomas Vondra wrote:
>> Hi all,
>>
>> today I got a few of errors like these (this one is from last week,
>> though):
>>
>> Status Line: 493 snapshot too old: Wed May 7 04:36:57 2014 GMT
>> Content:
>> snapshot to old: Wed May 7 04:36:57 2014 GMT
>>
>> on the new buildfarm animals. I believe it was my mistake (incorrectly
>> configured local git mirror), but it got me thinking about how this will
>> behave with the animals running CLOBBER_CACHE_RECURSIVELY.
>>
>> If I understand the Perl code correctly, it does this:
>>
>> (1) update the repository
>> (2) run the tests
>> (3) check that the snapshot is not older than 24 hours (pgstatus.pl:188)
>> (4) fail if older
>>
>> Now, imagine that the test runs for days/weeks. This pretty much means
>> it's wasted, because the results will be thrown away anyway, no?
>>
>
>
> The 24 hours runs from the time of the latest commit on the branch in
> question, not the current time, but basically yes.
>
> We've never had machines with runs that long. The longest in recent
> times has been friarbird, which runs CLOBBER_CACHE_ALWAYS and takes
> around 4.5 hours. But we have had misconfigured machines reporting
> unbelievable snapshot times. I'll take a look and see if we can tighten
> up the sanity check. It's worth noting that one thing friarbird does is
> skip the install-check stage - it's almost certainly not going to have
> terribly much interesting to tell us from that, given it has already run
> a plain "make check".
>
> How long does a CLOBBER_CACHE_RECURSIVELY run take? days or weeks seems
> kinda nuts.

I don't know. According to this comment from cache/inval.c, it's expected
to be way slower (~100x) compared to CLOBBER_CACHE_ALWAYS.

/*
* Test code to force cache flushes anytime a flush could happen.
*
* If used with CLOBBER_FREED_MEMORY, CLOBBER_CACHE_ALWAYS provides a
* fairly thorough test that the system contains no cache-flush hazards.
* However, it also makes the system unbelievably slow --- the regression
* tests take about 100 times longer than normal.
*
* If you're a glutton for punishment, try CLOBBER_CACHE_RECURSIVELY. This
* slows things by at least a factor of 10000, so I wouldn't suggest
* trying to run the entire regression tests that way.It's useful to try
* a few simple tests, to make sure that cache reload isn't subject to
* internal cache-flush hazards, but after you've done a few thousand
* recursive reloads it's unlikely you'll learn more.
*/

regards
Tomas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-05-15 20:01:46 Re: Logical replication woes
Previous Message Alvaro Herrera 2014-05-15 19:55:07 Re: autovacuum scheduling starvation and frenzy