Skip site navigation (1) Skip section navigation (2)

Re: Going for "all green" buildfarm results

From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: alvherre(at)commandprompt(dot)com
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Going for "all green" buildfarm results
Date: 2006-07-30 08:03:25
Message-ID: 44CC67CD.6090604@kaltenbrunner.cc (view raw or flat)
Thread:
Lists: pgsql-hackers
Alvaro Herrera wrote:
> Stefan Kaltenbrunner wrote:
>> Tom Lane wrote:
>>> Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
>>>> FWIW: lionfish had a weird make check error 3 weeks ago which I
>>>> (unsuccessfully) tried to reproduce multiple times after that:
>>>> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfish&dt=2006-05-12%2005:30:14
>>> Weird.
>>>
>>>   SELECT ''::text AS eleven, unique1, unique2, stringu1 
>>>                 FROM onek WHERE unique1 < 50 
>>>                 ORDER BY unique1 DESC LIMIT 20 OFFSET 39;
>>> ! ERROR:  could not open relation with OID 27035
>>>
>>> AFAICS, the only way to get that error in HEAD is if ScanPgRelation
>>> can't find a pg_class row with the mentioned OID.  Presumably 27035
>>> belongs to "onek" or one of its indexes.  The very next command also
>>> refers to "onek", and doesn't fail, so what we seem to have here is
>>> a transient lookup failure.  We've found a btree bug like that once
>>> before ... wonder if there's still one left?
>> FYI: lionfish just managed to hit that problem again:
>>
>> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfish&dt=2006-07-29%2023:30:06
> 
> The error message this time is
> 
> ! ERROR:  could not open relation with OID 27006

yeah and before it was:
! ERROR:  could not open relation with OID 27035

which looks quite related :-)

> 
> It's worth mentioning that the portals_p2 test, which happens in the
> parallel group previous to where this test is run, also accesses the
> onek table successfully.  It may be interesting to see exactly what
> relation is 27006.

sorry but i don't have access to the cluster in question any more
(lionfish is quite resource starved and I only enabled to keep failed
builds on -HEAD after the last incident ...)

> 
> The test alter_table, which is on the same parallel group as limit (the
> failing test), contains these lines:
> 
> ALTER INDEX onek_unique1 RENAME TO tmp_onek_unique1;
> ALTER INDEX tmp_onek_unique1 RENAME TO onek_unique1;

hmm interesting - lionfish is a slow box(250Mhz MIPS) and particulary
low on memory(48MB+140MB swap) so it is quite likely that the parallel
regress tests are driving it into swap - maybe some sort of subtile
timing issue ?


Stefan

In response to

pgsql-hackers by date

Next:From: Zoltan BoszormenyiDate: 2006-07-30 14:17:52
Subject: Re: Three weeks left until feature freeze
Previous:From: David FetterDate: 2006-07-30 06:25:44
Subject: Re: New variable server_version_num

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group