Re: Backend Crash

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Harvell F" <fharvell(at)file13(dot)info>
Cc: "PostgreSQL-development Hackers" <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Backend Crash
Date: 2007-04-18 17:46:14
Message-ID: 87lkgppod5.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Harvell F" <fharvell(at)file13(dot)info> writes:

> Just as a follow up, it turns out that our fiberchannel RAID was power cycled
> while the systems were up and running. There are several write errors in the
> postgresql log.
>
> Now I'm off to try to recover the data...

That's still a problem, it indicates either a bug in Postgres or -- sadly more
likely -- a problem with your hardware or system software setup. In a working
system Postgres guarantees that a situation like that will result in
transactions failing to commit (either with errors or freezing), not corrupted
data. Data once committed should never be lost.

In order for this to happen something in your software and hardware setup must
be caching writes then hiding the errors from Postgres. For instance systems
where fsync lies and reports success before it has written the data to disk
can result in silently corrupted data on any power outage or system crash.

Could you send the write errors? Or at least the first page or so of them?
And check the system logs at that time for any lower-level errors as well.

What kind of drives are in the fibrechannel RAID? Are they SCSI, PATA, or
SATA? Can you check their configuration at all or does the RAID hide all that
from you? Does the RAID have a battery backed cache?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-04-18 17:53:06 Re: Background LRU Writer/free list
Previous Message Gregory Stark 2007-04-18 17:33:40 Re: Background LRU Writer/free list