Re: Index corruption

From: Marc Munro <marc(at)bloodnok(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Index corruption
Date: 2006-06-30 02:00:22
Message-ID: 1151632823.3913.97.camel@bloodnok.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2006-06-29 at 21:47 -0400, Tom Lane wrote:
> One easy thing that would be worth trying is to build with
> --enable-cassert and see if any Asserts get provoked during the
> failure case. I don't have a lot of hope for that, but it's
> something that would require only machine time not people time.

I'll try this tomorrow.

> A couple other things to try, given that you can provoke the failure
> fairly easily:
>
> 1. In studying the code, it bothers me a bit that P_NEW is the same as
> InvalidBlockNumber. The intended uses of P_NEW appear to be adequately
> interlocked, but it's fairly easy to see how something like this could
> happen if there are any places where InvalidBlockNumber is
> unintentionally passed to ReadBuffer --- that would look like a P_NEW
> call and it *wouldn't* be interlocked. So it would be worth changing
> P_NEW to "(-2)" (this should just take a change in bufmgr.h and
> recompile) and adding an "Assert(blockNum != InvalidBlockNumber)"
> at the head of ReadBufferInternal(). Then rebuild with asserts enabled
> and see if the failure case provokes that assert.

I'll try this too.

> 2. I'm also eyeing this bit of code in hio.c:
>
> /*
> * If the FSM knows nothing of the rel, try the last page before
> * we give up and extend. This avoids one-tuple-per-page syndrome
> * during bootstrapping or in a recently-started system.
> */
> if (targetBlock == InvalidBlockNumber)
> {
> BlockNumber nblocks = RelationGetNumberOfBlocks(relation);
>
> if (nblocks > 0)
> targetBlock = nblocks - 1;
> }
>
> If someone else has just extended the relation, it's possible that this
> will allow a process to get to the page before the intended extender has
> finished initializing it. AFAICT that's not harmful because the page
> will look like it has no free space ... but it seems a bit fragile.
> If you dike out the above-mentioned code, can you still provoke the
> failure?

By dike out, you mean remove? Please confirm and I'll try it.

> A different line of attack is to see if you can make a self-contained
> test case so other people can try to reproduce it. More eyeballs on the
> problem are always better.

Can't really see this being possible. This is clearly a very unusual
problem and without similar hardware I doubt that anyone else will
trigger it. We ran this system happily for nearly a year on the
previous kernel without experiencing this problem (tcp lockups are a
different matter). Also the load is provided by a bunch of servers and
robots simulating rising and falling load.

> Lastly, it might be interesting to look at the WAL logs for the period
> leading up to a failure. This would give us an idea of what was
> happening concurrently with the processes that seem directly involved.

Next time we reproduce it, I'll take a copy of the WAL files too.

__
Marc

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-06-30 02:05:34 Re: Index corruption
Previous Message Tom Lane 2006-06-30 01:59:24 Re: Index corruption