From: | Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> |
---|---|
To: | alvherre(at)commandprompt(dot)com |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Re: Going for "all green" buildfarm results |
Date: | 2006-07-30 08:03:25 |
Message-ID: | 44CC67CD.6090604@kaltenbrunner.cc |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Alvaro Herrera wrote:
> Stefan Kaltenbrunner wrote:
>> Tom Lane wrote:
>>> Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
>>>> FWIW: lionfish had a weird make check error 3 weeks ago which I
>>>> (unsuccessfully) tried to reproduce multiple times after that:
>>>> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfish&dt=2006-05-12%2005:30:14
>>> Weird.
>>>
>>> SELECT ''::text AS eleven, unique1, unique2, stringu1
>>> FROM onek WHERE unique1 < 50
>>> ORDER BY unique1 DESC LIMIT 20 OFFSET 39;
>>> ! ERROR: could not open relation with OID 27035
>>>
>>> AFAICS, the only way to get that error in HEAD is if ScanPgRelation
>>> can't find a pg_class row with the mentioned OID. Presumably 27035
>>> belongs to "onek" or one of its indexes. The very next command also
>>> refers to "onek", and doesn't fail, so what we seem to have here is
>>> a transient lookup failure. We've found a btree bug like that once
>>> before ... wonder if there's still one left?
>> FYI: lionfish just managed to hit that problem again:
>>
>> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfish&dt=2006-07-29%2023:30:06
>
> The error message this time is
>
> ! ERROR: could not open relation with OID 27006
yeah and before it was:
! ERROR: could not open relation with OID 27035
which looks quite related :-)
>
> It's worth mentioning that the portals_p2 test, which happens in the
> parallel group previous to where this test is run, also accesses the
> onek table successfully. It may be interesting to see exactly what
> relation is 27006.
sorry but i don't have access to the cluster in question any more
(lionfish is quite resource starved and I only enabled to keep failed
builds on -HEAD after the last incident ...)
>
> The test alter_table, which is on the same parallel group as limit (the
> failing test), contains these lines:
>
> ALTER INDEX onek_unique1 RENAME TO tmp_onek_unique1;
> ALTER INDEX tmp_onek_unique1 RENAME TO onek_unique1;
hmm interesting - lionfish is a slow box(250Mhz MIPS) and particulary
low on memory(48MB+140MB swap) so it is quite likely that the parallel
regress tests are driving it into swap - maybe some sort of subtile
timing issue ?
Stefan
From | Date | Subject | |
---|---|---|---|
Next Message | Zoltan Boszormenyi | 2006-07-30 14:17:52 | Re: Three weeks left until feature freeze |
Previous Message | David Fetter | 2006-07-30 06:25:44 | Re: New variable server_version_num |