Re: Duplicate rows sneaking in despite PRIMARY KEY / UNIQUE

From: Travis Cross <travis(at)crosswirecorp(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Duplicate rows sneaking in despite PRIMARY KEY / UNIQUE
Date: 2006-06-06 16:58:10
Message-ID: 4485B422.1050604@crosswirecorp.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> Travis Cross <travis(at)crosswirecorp(dot)com> writes:
>> I'm noticing that a handful (4-16) of rows with duplicate columns
>> (uid,token) are sneaking into the table every day despite the
>> primary key constraint.
>
> Corrupt index, looks like ... you might try reindexing the index.

I probably should have mentioned that I have indeed done a REINDEX
on the table a couple of times in the past, suspecting that issue,
and having seen it resolve similar issues on this list. Upon your
suggestion, I'm running one right now, and I will probably dump and
reload the entire database after hours, unless anyone thinks that
would be a bad idea (or unproductive in tracking this down).

> I don't believe that the PANIC you show has anything directly to do
> with duplicate entries. It is a symptom of corrupt index structure.
> Now a corrupt index might also explain failure to notice duplications,
> but changing your application isn't going to fix whatever is causing
> it. You need to look for server-side causes.

Indeed, you are correct. I should also mention that the problem
seems to build over time, in the sense that everything will run fine
for awhile (a few days), and then will crash repeatedly. Deleting
the duplicate rows seems to reset the counter -- of course, I cannot
run a successful REINDEX until I have deleted those duplicate rows.

> Any database or system crashes on this server (before this problem
> started)?

No. In fact, this box, and a sister box running similar hardware,
have been models of system stability. My uptimes are 46 and 87
days, respectively, representing the time since I've done a kernel
upgrade and the time since I plugged the boxes into the rack. The
sister box is running real-time voice services.

> Do you *know* that the disk drive will not lie about write
> complete?

"Know" is such a strong word ;) Honestly, I have very little idea.
I understand the nature of the problem this presents, as I've read
the very fine PostgreSQL manual many times over the years.

Because the drives I use are specifically designed to operate well
in a RAID environment, I would 'hope' that the drives perform honest
write operations.

I wonder if there is a utility to perform a deterministic test of
this...

> What is the platform and storage system, anyway?

The platform is:

Linux 2.6.16.9 (w/o loadable modules)
Supermicro PDSMi (a single processor P-D board)
2G ECC DDRII SDRAM

The storage system is:

On-board SATA ICH7R Controller
2 x WD3200SD hard drives running in a Linux RAID 1 configuration.
That is to say: Western Digital 320G SATA 'enterprise' drives. The
drives have a somewhat unique feature: time-limited error recovery,
which is supposed to let the RAID controller/software deal with
errors after a certain point (7 seconds), rather than continuing to
block, and causing the drive to fall out of the array.

The drive:
http://www.westerndigital.com/en/products/products.asp?driveid=114&language=en

I'll run file system consistency checks tonight to see if I can pick
out a proximal cause for all this chaos.

I really do appreciate the assistance.

Cheers,

-- Travis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Travis Cross 2006-06-06 17:02:07 Re: Duplicate rows sneaking in despite PRIMARY KEY / UNIQUE
Previous Message Bruce Momjian 2006-06-06 16:20:03 Re: AIX check in datetime.h