Re: Duplicate values found when reindexing unique index

From: "Mason Hale" <masonhale(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: Duplicate values found when reindexing unique index
Date: 2008-01-03 19:58:23
Message-ID: 8bca3aa10801031158l26a2086bq8caba8151e89a316@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi everyone --

Sorry to revisit a dead horse, but I wanted to clear up some misinformation
--

On Dec 31, 2007 5:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> "Mason Hale" <masonhale(at)gmail(dot)com> writes:
> >> This could be the kernel's fault, but I'm wondering whether the
> >> RAID controller is going south.
>
> > To clarify a bit further -- on the production server, the data is
> written to
> > a 10-disk RAID 1+0, but the pg_xlog directory is symlinked to a
> separate,
> > dedicated SATA II disk.
>
> > There is a similar setup on the standby server, except that in addition
> to
> > the RAID for the data, and a separate SATA II disk for the pg_xlog,
> there is
> > another disk (also SATA II) dedicated for the archive of wal files
> copied
> > over from the production server.
>

It turns out that the separate SATA II disk was configured as a single-disk
JBOD on the same controller as the 10-disk RAID 1+0.

Since we've seen corruption in the data directory (on the RAID) and in the
pg_xlog directory (on the SATA II disk) the RAID controller is one of the
few common elements between those two partitions and hence is highly
suspect, and may dispel some of the mystery with our situation.

We will be replacing the RAID controller in short order. For what it is
worth it is an Adaptec 31605 with a battery backup module.

>
> Oh. Maybe it's one of those disks' fault then. Although WAL corruption
> would not lead to corruption of the primary DB as long as there were no
> crash/replay events. Maybe there is more than one issue here, or maybe
> it's the kernel's fault after all.
>
>
Given the new information about the RAID controller is managing all the
disks in the question (after all) -- if the RAID controller is going south,
then there would be no need for a crash/replay event for that corruption to
make it into the primary DB. Seems to be pretty damning evidence against the
RAID controller, agreed?

Mason

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Mark Kirkwood 2008-01-03 21:56:46 Re: BUG #3833: Index remains when table is dropped
Previous Message Jeff Ross 2008-01-03 19:15:48 Re: ctrl \ makes psql 8.2.5 dump core