Re: Going for "all green" buildfarm results

From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: alvherre(at)commandprompt(dot)com
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Going for "all green" buildfarm results
Date: 2006-07-30 08:03:25
Message-ID: 44CC67CD.6090604@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alvaro Herrera wrote:
> Stefan Kaltenbrunner wrote:
>> Tom Lane wrote:
>>> Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
>>>> FWIW: lionfish had a weird make check error 3 weeks ago which I
>>>> (unsuccessfully) tried to reproduce multiple times after that:
>>>> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfish&dt=2006-05-12%2005:30:14
>>> Weird.
>>>
>>> SELECT ''::text AS eleven, unique1, unique2, stringu1
>>> FROM onek WHERE unique1 < 50
>>> ORDER BY unique1 DESC LIMIT 20 OFFSET 39;
>>> ! ERROR: could not open relation with OID 27035
>>>
>>> AFAICS, the only way to get that error in HEAD is if ScanPgRelation
>>> can't find a pg_class row with the mentioned OID. Presumably 27035
>>> belongs to "onek" or one of its indexes. The very next command also
>>> refers to "onek", and doesn't fail, so what we seem to have here is
>>> a transient lookup failure. We've found a btree bug like that once
>>> before ... wonder if there's still one left?
>> FYI: lionfish just managed to hit that problem again:
>>
>> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfish&dt=2006-07-29%2023:30:06
>
> The error message this time is
>
> ! ERROR: could not open relation with OID 27006

yeah and before it was:
! ERROR: could not open relation with OID 27035

which looks quite related :-)

>
> It's worth mentioning that the portals_p2 test, which happens in the
> parallel group previous to where this test is run, also accesses the
> onek table successfully. It may be interesting to see exactly what
> relation is 27006.

sorry but i don't have access to the cluster in question any more
(lionfish is quite resource starved and I only enabled to keep failed
builds on -HEAD after the last incident ...)

>
> The test alter_table, which is on the same parallel group as limit (the
> failing test), contains these lines:
>
> ALTER INDEX onek_unique1 RENAME TO tmp_onek_unique1;
> ALTER INDEX tmp_onek_unique1 RENAME TO onek_unique1;

hmm interesting - lionfish is a slow box(250Mhz MIPS) and particulary
low on memory(48MB+140MB swap) so it is quite likely that the parallel
regress tests are driving it into swap - maybe some sort of subtile
timing issue ?

Stefan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zoltan Boszormenyi 2006-07-30 14:17:52 Re: Three weeks left until feature freeze
Previous Message David Fetter 2006-07-30 06:25:44 Re: New variable server_version_num