Re: Another PANIC corrupt index/crash ...any thoughts?

From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Another PANIC corrupt index/crash ...any thoughts?
Date: 2010-02-01 15:22:13
Message-ID: dcc563d11002010722j1f76067fl211a2cc56d45c56d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Feb 1, 2010 at 7:45 AM, Jeff Amiel <becauseimjeff(at)yahoo(dot)com> wrote:
> About a month ago I posted about a database crash possibly caused by corrupt index..
> Coincidentally (or not) started getting disk errors about a minute AFTER the above error (db storage is on a fibre attached SAN)

Not likely a coincidence.

> /var/log/archive/log-2010-01-29.log:Jan 29 15:18:50 db-1 scsi_vhci: [ID 734749 kern.warning] WARNING: vhci_scsi_reset 0x1
> /var/log/archive/log-2010-01-29.log:Jan 29 15:18:50 db-1 scsi: [ID 243001 kern.warning] WARNING: /pci(at)0,0/pci10de,5d(at)d/pci1077,142(at)0/fp(at)0,0 (fcp1):
> /var/log/archive/log-2010-01-29.log:Jan 29 15:18:52 db-1 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk(at)g000b08001c001958 (sd9):
> /var/log/archive/log-2010-01-29.log:Jan 29 15:18:52 db-1 scsi: [ID 107833 kern.notice]  Requested Block: 206265378                 Error Block: 206265378
> /var/log/archive/log-2010-01-29.log:Jan 29 15:18:52 db-1 scsi: [ID 107833 kern.notice]  Vendor: Pillar                             Serial Number:
> /var/log/archive/log-2010-01-29.log:Jan 29 15:18:52 db-1 scsi: [ID 107833 kern.notice]  Sense Key: Unit Attention
> /var/log/archive/log-2010-01-29.log:Jan 29 15:18:52 db-1 scsi: [ID 107833 kern.notice]  ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
>
> Any thoughts on how I should proceed?

Figure out what's broken in your hardware? It looks like a driver issue to me.

> We are planning an upgrade to 8.4 in the short-term, but I can see no evidence of fixes since the 8.2 version that would relate to index corruption.

This is not a postgresql issue, it is a bad hardware / driver issue.
PostgreSQL cannot cause a SCSI reset etc on its own, it requires
something be broken in the OS / hardware for that to happen.

> I have no real evidence of bad disks...iostat -E reports:

No, but this iostat output is evidence of a bad SAN driver / SAN or
something around there.

>
> # iostat -E
> sd2       Soft Errors: 1 Hard Errors: 4 Transport Errors: 0
> Vendor: Pillar   Product: Axiom 300        Revision: 0000 Serial No:
> Size: 2.20GB <2200567296 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 4 Recoverable: 0
> Illegal Request: 1 Predictive Failure Analysis: 0
> sd3       Soft Errors: 1 Hard Errors: 32 Transport Errors: 0
> Vendor: Pillar   Product: Axiom 300        Revision: 0000 Serial No:
> Size: 53.95GB <53948448256 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 32 Recoverable: 0
> Illegal Request: 1 Predictive Failure Analysis: 0
> sd7       Soft Errors: 1 Hard Errors: 40 Transport Errors: 8
> Vendor: Pillar   Product: Axiom 300        Revision: 0000 Serial No:
> Size: 53.95GB <53948448256 bytes>
> Media Error: 0 Device Not Ready: 1 No Device: 33 Recoverable: 0
> Illegal Request: 1 Predictive Failure Analysis: 0
> sd8       Soft Errors: 1 Hard Errors: 34 Transport Errors: 0
> Vendor: Pillar   Product: Axiom 300        Revision: 0000 Serial No:
> Size: 107.62GB <107622432256 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 34 Recoverable: 0
> Illegal Request: 1 Predictive Failure Analysis: 0
> sd9       Soft Errors: 1 Hard Errors: 32 Transport Errors: 2
> Vendor: Pillar   Product: Axiom 300        Revision: 0000 Serial No:
> Size: 215.80GB <215796153856 bytes>
> Media Error: 0 Device Not Ready: 1 No Device: 29 Recoverable: 0
> Illegal Request: 1 Predictive Failure Analysis: 0

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Scott Marlowe 2010-02-01 15:26:08 Re: Another PANIC corrupt index/crash ...any thoughts?
Previous Message Yeb Havinga 2010-02-01 14:53:39 Re: Can LISTEN/NOTIFY deal with more than 100 every second?