Index corruption

From: Marc Munro <marc(at)bloodnok(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Index corruption
Date: 2006-06-28 16:28:14
Message-ID: 1151512094.26442.34.camel@bloodnok.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

We have now experienced index corruption on two separate but identical
slony clusters. In each case the slony subscriber failed after
attempting to insert a duplicate record. In each case reindexing the
sl_log_1 table on the provider fixed the problem.

The latest occurrence was on our production cluster yesterday. This has
only happened since we performed kernel upgrades and we are uncertain
whether this represents a kernel bug, or a postgres bug exposed by
different timings in the new kernel.

Our systems are:

Sun v40z 4 x Dual Core AMD Opteron(tm) Processor 875
Kernel 2.6.16.14 #8 SMP x86_64 x86_64 x86_64 GNU/Linux
kernel boot option: elevator=deadline
16 Gigs of RAM
postgresql-8.0.3-1PGDG
Bonded e1000/tg3 NICs with 8192 MTU.
Slony 1.1.0

NetApp FAS270 OnTap 7.0.3
Mounted with the NFS options
rw,nfsvers=3,hard,rsize=32768,wsize=32768,timeo=600,tcp,noac
Jumbo frames 8192 MTU.

All postgres data and logs are stored on the netapp.

In the latest episode, the index corruption was coincident with a
slony-induced vacuum. I don't know if this was the case with our test
system failures.

What can we do to help identify the cause of this? I believe we will be
able to reproduce this on a test system if there is some useful
investigation we can perform.

__
Marc

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Hallgren 2006-06-28 16:39:16 Fixed length datatypes. WAS [GENERAL] UUID's as primary keys
Previous Message Phil Frost 2006-06-28 16:24:04 Re: optimizing constant quals within outer joins