Skip site navigation (1) Skip section navigation (2)

Re: Index corruption

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marc Munro <marc(at)bloodnok(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Index corruption
Date: 2006-06-30 01:59:24
Message-ID: 19068.1151632764@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-hackers
[ back to the start of the thread... ]

Marc Munro <marc(at)bloodnok(dot)com> writes:
> We have now experienced index corruption on two separate but identical
> slony clusters.  In each case the slony subscriber failed after
> attempting to insert a duplicate record.  In each case reindexing the
> sl_log_1 table on the provider fixed the problem.

> Sun v40z 4 x Dual Core AMD Opteron(tm) Processor 875
> Kernel 2.6.16.14 #8 SMP x86_64 x86_64 x86_64 GNU/Linux
> ...
> NetApp FAS270 OnTap 7.0.3
> Mounted with the NFS options
> rw,nfsvers=3D3,hard,rsize=3D32768,wsize=3D32768,timeo=3D600,tcp,noac
> Jumbo frames 8192 MTU.
> All postgres data and logs are stored on the netapp.

BTW, a couple of thoughts here:

* If my theory about the low-level cause is correct, then reindexing
sl_log_1 would make the "duplicate key" errors go away, but nonetheless
you'd have lost data --- the overwritten rows would be gone.  I suppose
that this would result in the slave missing some rows that are present
on the master.  Have you tried comparing slave and master databases to
see if you can find any discrepancies?

* One way that the problem could happen would be if a race condition in
the kernel allowed an lseek(fd, 0, SEEK_END) to return a value less than
the true end-of-file (as determined by another process' write()
extending the EOF just slightly earlier --- ie, lseek fails to note the
effects of the just-completed write, and returns the prior EOF value).
PG does have internal locking that should guarantee that the lseek is
not done until after the write completes ... but could there be a bug in
the kernel allowing stale data to be returned?  The SMP hardware is
relevant (maybe one processor sees different data than the other) and
frankly I don't trust NFS very far at all for questions such as this.
It'd be interesting to see if you can reproduce the problem in a
database on local storage.

			regards, tom lane

In response to

Responses

pgsql-hackers by date

Next:From: Marc MunroDate: 2006-06-30 02:00:22
Subject: Re: Index corruption
Previous:From: Tom LaneDate: 2006-06-30 01:47:17
Subject: Re: Index corruption

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group