Quick Links

Re: What in the world is happening with castoroides and protosciurus?

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Dave Page <dpage(at)pgadmin(dot)org>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: What in the world is happening with castoroides and protosciurus?
Date:	2014-08-30 22:32:07
Message-ID:	20140830223207.GA844556@tornado.leadboat.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Aug 26, 2014 at 10:17:05AM +0100, Dave Page wrote:
> On Tue, Aug 26, 2014 at 1:46 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > For the last month or so, these two buildfarm animals (which I believe are
> > the same physical machine) have been erratically failing with errors that
> > reflect low-order differences in floating-point calculations.
> >
> > A recent example is at
> >
> > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=protosciurus&dt=2014-08-25%2010%3A39%3A52
> >
> > where the only regression diff is
> >
> > *** /export/home/dpage/pgbuildfarm/protosciurus/HEAD/pgsql.22860/src/test/regress/expected/hash_index.out Mon Aug 25 11:41:00 2014
> > --- /export/home/dpage/pgbuildfarm/protosciurus/HEAD/pgsql.22860/src/test/regress/results/hash_index.out Mon Aug 25 11:57:26 2014
> > ***************
> > *** 171,179 ****
> > SELECT h.seqno AS i8096, h.random AS f1234_1234
> > FROM hash_f8_heap h
> > WHERE h.random = '-1234.1234'::float8;
> > ! i8096 | f1234_1234
> > ! -------+------------
> > ! 8906 | -1234.1234
> > (1 row)
> >
> > UPDATE hash_f8_heap
> > --- 171,179 ----
> > SELECT h.seqno AS i8096, h.random AS f1234_1234
> > FROM hash_f8_heap h
> > WHERE h.random = '-1234.1234'::float8;
> > ! i8096 | f1234_1234
> > ! -------+-------------------
> > ! 8906 | -1234.12356777216
> > (1 row)
> >
> > UPDATE hash_f8_heap
> >
> > ... a result that certainly makes no sense. The results are not
> > repeatable, failing in equally odd ways in different tests on different
> > runs. This is happening in all the back branches too, not just HEAD.

> I have
> no idea what is causing the current issue - the machine is stable
> software-wise, and only has private builds of dependency libraries
> update periodically (which are not used for the buildfarm). If I had
> to hazard a guess, I'd suggest this is an early symptom of an old
> machine which is starting to give up.

Agreed. Rerunning each animal against older commits would test that theory.
Say, run against the last 6 months of REL9_0_STABLE commits. If those runs
show today's failure frequencies instead of historic failure frequencies, it's
not a PostgreSQL regression. Not that I see a commit back-patched near the
time of the failure uptick (2014-08-06) that looks remotely likely to have
introduced such a regression.

It would be sad to lose our only buildfarm coverage of plain Solaris and of
the Sun Studio compiler, but having buildfarm members this unstable is a pain.
Perhaps have those animals retry the unreliable steps up to, say, 7 times?

In response to

Re: What in the world is happening with castoroides and protosciurus? at 2014-08-26 09:17:05 from Dave Page

Responses

Re: What in the world is happening with castoroides and protosciurus? at 2014-09-01 08:15:12 from Dave Page

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Heikki Linnakangas	2014-08-30 22:50:48	Re: postgresql latency & bgwriter not doing its job
Previous Message	Tom Lane	2014-08-30 19:31:11	Re: Selectivity estimation for inet operators