Quick Links

Re: RESOLVED: Explained by known hardware failures, or keep looking?

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	<pgsql-admin(at)postgresql(dot)org>
Subject:	Re: RESOLVED: Explained by known hardware failures, or keep looking?
Date:	2007-06-20 14:26:32
Message-ID:	4678F2C7.EE98.0025.0@wicourts.gov
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-admin

Thanks, all. Just an FYI to wrap up the thread.

>>> On Mon, Jun 18, 2007 at 3:25 PM, in message <4713(dot)1182198324(at)sss(dot)pgh(dot)pa(dot)us>,
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
>> I'm suspicious that either the controller
>> didn't persist dirty pages in the June 14th failure
>
> That's what it looks like to me --- it's hard to tell if the hardware or
> the filesystem is at fault, but one way or another some pages that were
> supposedly securely down on disk were wiped to zeroes. You should
> probably review whether the hardware is correctly reporting write-complete.

The hardware tech found many problems with this box. I may just give it
a heavy update load and pull both plugs to see if it comes up clean now.

The following was done:

Replaced 2 failed drives
Controller firmware updated
SCSI micro code updated
Performed Yast Online updates
Connected second power supply

Our newer boxes have monitoring software which alerts us before a box
gets into this bad a state.

-Kevin

In response to

Re: Explained by known hardware failures, or keep looking? at 2007-06-18 20:25:24 from Tom Lane

Browse pgsql-admin by date

	From	Date	Subject
Next Message	Andrew Sullivan	2007-06-20 14:48:10	On managerial choosing (was: Postgres VS Oracle)
Previous Message	daljeet.mehta	2007-06-20 14:14:07	Issue while installing RPM's