Re: clarifying a few error messages

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Thomas O'Connell" <tfo(at)monsterlabs(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: clarifying a few error messages
Date: 2003-01-13 20:03:51
Message-ID: 4782.1042488231@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

"Thomas O'Connell" <tfo(at)monsterlabs(dot)com> writes:
> So I've managed to determine that the interrupt messages most likely
> coincided with the server reboots.
> Could the same thing have caused the signal 11? An unexpected external
> event?

My guess is that you've got hardware problems, most likely bad RAM. The
SIGSEGV is probably a side-effect of RAM dropping bits unexpectedly ---
for example, the value of a pointer stored in memory might have changed
so that it appears to point outside Postgres' valid address space,
leading to SIGSEGV next time the pointer is used.

The fact that you're seeing unexpected reboots is what points the finger
at the hardware; evidently the kernel is suffering the same kinds of
problems. (Or you could believe that your hardware is okay and both the
kernel and Postgres have suddenly developed severe bugs; but the
hardware theory seems much more plausible.)

> And is exit code 2 just related to the bad clog?

Yes. This part looks like corrupted data on disk :-( ... likely also a
side effect of busted RAM. Probably the RAM corrupted a page image that
was sitting in an in-memory buffer, and then it got written out before
any other problem was noticed.

I hope you have a recent good backup that you can restore from after you
fix your hardware. I would not trust what's presently on your disk if I
were you.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2003-01-13 20:05:39 Re: GUC/postgresql.conf docs
Previous Message dev 2003-01-13 19:33:01 Re: loading delimited files