Re: strange buildfarm failures

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: strange buildfarm failures
Date: 2007-04-29 16:25:52
Message-ID: 20070429162552.GH18593@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alvaro Herrera wrote:
> Stefan Kaltenbrunner wrote:
>
> > well - i now have a core file but it does not seem to be much worth
> > except to prove that autovacuum seems to be the culprit:
> >
> > Core was generated by `postgres: autovacuum worker process
> > '.
> > Program terminated with signal 6, Aborted.
> >
> > [...]
> >
> > #0 0x00000ed9 in ?? ()
> > warning: GDB can't find the start of the function at 0xed9.
>
> Interesting. Notice how it doesn't have the database name in the ps
> display. This means it must have crashed between the initial
> init_ps_display and the set_ps_display call just before starting to
> vacuum. So the bug is probably in the startup code; probably the code
> dealing with the PGPROC which is the newest and weirder stuff.

Oh, another thing that I think may be happening is that the stack is
restored in longjmp, so it is trying to report an error elsewhere but
it crashes because something got overwritten or something; i.e. a
bug in the error recovery code. I don't know how feasible this is or
even if it makes sense (would longjmp() restore the ps display?), but we
had similar, very hard to debug errors in Mammoth Replicator, which is
why I'm mentioning it in case it rings a bell.

--
Alvaro Herrera Developer, http://www.PostgreSQL.org/
"The only difference is that Saddam would kill you on private, where the
Americans will kill you in public" (Mohammad Saleh, 39, a building contractor)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-04-29 16:30:04 Re: Reducing stats collection overhead
Previous Message Alvaro Herrera 2007-04-29 16:22:02 Re: strange buildfarm failures