From: | Marc Munro <marc(at)bloodnok(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Re: Index corruption |
Date: | 2006-06-30 02:00:22 |
Message-ID: | 1151632823.3913.97.camel@bloodnok.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 2006-06-29 at 21:47 -0400, Tom Lane wrote:
> One easy thing that would be worth trying is to build with
> --enable-cassert and see if any Asserts get provoked during the
> failure case. I don't have a lot of hope for that, but it's
> something that would require only machine time not people time.
I'll try this tomorrow.
> A couple other things to try, given that you can provoke the failure
> fairly easily:
>
> 1. In studying the code, it bothers me a bit that P_NEW is the same as
> InvalidBlockNumber. The intended uses of P_NEW appear to be adequately
> interlocked, but it's fairly easy to see how something like this could
> happen if there are any places where InvalidBlockNumber is
> unintentionally passed to ReadBuffer --- that would look like a P_NEW
> call and it *wouldn't* be interlocked. So it would be worth changing
> P_NEW to "(-2)" (this should just take a change in bufmgr.h and
> recompile) and adding an "Assert(blockNum != InvalidBlockNumber)"
> at the head of ReadBufferInternal(). Then rebuild with asserts enabled
> and see if the failure case provokes that assert.
I'll try this too.
> 2. I'm also eyeing this bit of code in hio.c:
>
> /*
> * If the FSM knows nothing of the rel, try the last page before
> * we give up and extend. This avoids one-tuple-per-page syndrome
> * during bootstrapping or in a recently-started system.
> */
> if (targetBlock == InvalidBlockNumber)
> {
> BlockNumber nblocks = RelationGetNumberOfBlocks(relation);
>
> if (nblocks > 0)
> targetBlock = nblocks - 1;
> }
>
> If someone else has just extended the relation, it's possible that this
> will allow a process to get to the page before the intended extender has
> finished initializing it. AFAICT that's not harmful because the page
> will look like it has no free space ... but it seems a bit fragile.
> If you dike out the above-mentioned code, can you still provoke the
> failure?
By dike out, you mean remove? Please confirm and I'll try it.
> A different line of attack is to see if you can make a self-contained
> test case so other people can try to reproduce it. More eyeballs on the
> problem are always better.
Can't really see this being possible. This is clearly a very unusual
problem and without similar hardware I doubt that anyone else will
trigger it. We ran this system happily for nearly a year on the
previous kernel without experiencing this problem (tcp lockups are a
different matter). Also the load is provided by a bunch of servers and
robots simulating rising and falling load.
> Lastly, it might be interesting to look at the WAL logs for the period
> leading up to a failure. This would give us an idea of what was
> happening concurrently with the processes that seem directly involved.
Next time we reproduce it, I'll take a copy of the WAL files too.
__
Marc
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2006-06-30 02:05:34 | Re: Index corruption |
Previous Message | Tom Lane | 2006-06-30 01:59:24 | Re: Index corruption |