Re: Index corruption

From: Marc Munro <marc(at)bloodnok(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Index corruption
Date: 2006-06-30 02:14:19
Message-ID: 1151633659.3913.111.camel@bloodnok.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2006-06-29 at 21:59 -0400, Tom Lane wrote:
> [ back to the start of the thread... ]
>
> BTW, a couple of thoughts here:
>
> * If my theory about the low-level cause is correct, then reindexing
> sl_log_1 would make the "duplicate key" errors go away, but nonetheless
> you'd have lost data --- the overwritten rows would be gone. I suppose
> that this would result in the slave missing some rows that are present
> on the master. Have you tried comparing slave and master databases to
> see if you can find any discrepancies?

Haven't done that yet - in test we tend to restart the old subscriber as
the new provider and rebuild the cluster. I'll check the logs from our
production failure to figure out what to compare and see what I can
discover.

> * One way that the problem could happen would be if a race condition in
> the kernel allowed an lseek(fd, 0, SEEK_END) to return a value less than
> the true end-of-file (as determined by another process' write()
> extending the EOF just slightly earlier --- ie, lseek fails to note the
> effects of the just-completed write, and returns the prior EOF value).
> PG does have internal locking that should guarantee that the lseek is
> not done until after the write completes ... but could there be a bug in
> the kernel allowing stale data to be returned? The SMP hardware is
> relevant (maybe one processor sees different data than the other) and
> frankly I don't trust NFS very far at all for questions such as this.
> It'd be interesting to see if you can reproduce the problem in a
> database on local storage.

Unfortunately we haven't got any local storage that can stand the sort
of loads we are putting through. With slower storage the CPUs mostly
sit idle and we are very unlikely to trigger a timing-based bug if
that's what it is.

I'll get back to you with kernel build information tomorrow. We'll also
try to talk to some kernel hackers about this.

Many thanks for your efforts so far.
__
Marc

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-06-30 02:36:27 Re: Longer startup delay (was Re: Single Index Tuple Chain (SITC) method)
Previous Message Tom Lane 2006-06-30 02:05:34 Re: Index corruption