Re: Postgresql 8.4.1 segfault, backtrace

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Richard Neill <rn214(at)cam(dot)ac(dot)uk>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Postgresql 8.4.1 segfault, backtrace
Date: 2009-09-24 15:16:06
Message-ID: 23820.1253805366@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Richard Neill <rn214(at)cam(dot)ac(dot)uk> writes:
> I've just upgraded from 8.4.0 to 8.4.1 because of a segfault in 8.4, and
> we've found that this is still happening repeatedly in 8.4.1.

Oh dear. I just got an off-list report that seems to point to the same
kind of thing.

> The backtrace points to line 2654 in relcache.c, in
> RelationCacheInitializePhase2()

> There is a NULL dereference of "relation"

> => needNewCacheFile = false
> criticalRelcachesBuilt = true

> => nothing is happening before it enters the failure code block.

<spock>Fascinating.</spock>

I think this must mean that corrupt data is being read from the relcache
init file. The reason a restart fixes it is probably that restart
forcibly removes the old init file, which is good for recovery but not
so good for finding out what's wrong. Could you modify
RelationCacheInitFileRemove (at the bottom of relcache.c) to rename the
file someplace else instead of deleting it? And then send me a copy
of the bad file once you have one?

> I can give you a core dump if anyone would like to see it, but it's 405
> MB after bzipping.

Not going to help anyone else anyway, since it's uninterpretable without
a duplicate system. (If you have a spare machine with the same OS and
the same postgres executables, maybe you could put the core file on that
and let me ssh in to have a look?)

> One last observation: a dump and restore of the DB seems to prevent it
> crashing for about a day.

Do you have any maintenance operations that touch the system catalogs
(like maybe a forced REINDEX)? Can you correlate the crashes with any
activity of that sort?

BTW, the other reporter claimed that the problem went away after
building with asserts+debug. I'm not sure I believe that, especially
seeing that you evidently have debug on. But if you don't have asserts
enabled, please rebuild with them and see if that changes anything.

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2009-09-24 17:35:29 Re: BUG #5078: returns setof functions fails after table structure altered
Previous Message Dave Page 2009-09-24 13:01:14 Re: Porblem running on Windows 2003 server