Re: strange buildfarm failures

From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: strange buildfarm failures
Date: 2007-04-28 12:44:34
Message-ID: 463341B2.3020608@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alvaro Herrera wrote:
> Tom Lane wrote:
>> Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
>>> Stefan Kaltenbrunner wrote:
>>>> two of my buildfarm members had different but pretty weird looking
>>>> failures lately:
>>>> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=quagga&dt=2007-04-25%2002:03:03
>>>> and
>>>>
>>>> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=emu&dt=2007-04-24%2014:35:02
>>>>
>>>> any ideas on what might causing those ?
>
> Just for the record, quagga and emu failures don't seem related to the
> report below. They don't crash; the regression.diffs contains data that
> suggests that there may be data corruption of some sort.
>
> INSERT INTO INET_TBL (c, i) VALUES ('192.168.1.2/30', '192.168.1.226');
> ERROR: invalid cidr value: "%{"
>
> This doesn't seem to make much sense.

no idea - but quagga and emu seem to have similiar failure (in the sense
that they don't make any sense) and i have no reson to believe that the
hardware is a fault.

>
>
>>> lionfish just failed too:
>>> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfish&dt=2007-04-25%2005:30:09
>> And had a similar failure a few days ago. The curious thing is that
>> what we get in the postmaster log is
>>
>> LOG: server process (PID 23405) was terminated by signal 6: Aborted
>> LOG: terminating any other active server processes
>>
>> You would think SIGABRT would come from an assertion failure, but
>> there's no preceding assertion message in the log. The other
>> characteristic of these crashes is that *all* of the failing regression
>> instances report "terminating connection because of crash of another
>> server process", which suggests strongly that the crash was in an
>> autovacuum process (if it were bgwriter or stats collector the
>> postmaster would've said so). So I think the recent autovac patches
>> are at fault. I spent a bit of time trolling for a spot where the code
>> might abort() without having printed anything, but didn't find one.
>
> Hmm. I kept an eye on the buildfarm for a few days, but saw nothing
> that could be connected to autovacuum so I neglected it.
>
> This is the other failure:
>
> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfish&dt=2007-04-20%2005:30:14
>
> It shows the same pattern. I am baffled -- I don't understand how it
> can die without reporting the error.
>
> Apparently it crashes rather frequently, so it shouldn't be too
> difficult to reproduce on manual runs. If we could get it to run with a
> higher debug level, it might prove helpful to further pinpoint the
> problem.
>
> The core file would be much better obviously (first and foremost to
> confirm that it's autovacuum that's crashing ... )

well - i now have a core file but it does not seem to be much worth
except to prove that autovacuum seems to be the culprit:

Core was generated by `postgres: autovacuum worker process
'.
Program terminated with signal 6, Aborted.

[...]

#0 0x00000ed9 in ?? ()
warning: GDB can't find the start of the function at 0xed9.

GDB is unable to find the start of the function at 0xed9
and thus can't determine the size of that function's stack frame.
This means that GDB may be unable to access that stack frame, or
the frames below it.
This problem is most likely caused by an invalid program counter or
stack pointer.
However, if you think GDB should simply search farther back
from 0xed9 for code which looks like the beginning of a
function, you can increase the range of the search using the `set
heuristic-fence-post' command.

Stefan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shieldy 2007-04-28 12:56:46 I have made the first step on postgresql, but got some problems
Previous Message Heikki Linnakangas 2007-04-28 10:22:59 Re: Avoiding unnecessary reads in recovery