Re: right sibling is not next child

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: right sibling is not next child
Date: 2006-04-13 01:30:59
Message-ID: 6646.1144891859@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I wrote:
> Does that index contain any sensitive data, and if not could I trouble
> you for a copy? I'm still not clear on the mechanism by which the
> indexes got corrupted like this.

Oh, never mind ... I've sussed it.

nbtxlog.c's forget_matching_split() assumes it can look into the page
that was just updated to get the block number associated with a non-leaf
insertion. This is OK *only if the page has exactly its state at the
time of the WAL record*. However, btree_xlog_insert() is coded to do
nothing if the page has an LSN larger than the WAL record's LSN --- that
is, if the page reflects a state *later than* this insertion. So if the
page is newer than that --- say, there were some subsequent insertions at
earlier positions in the page --- forget_matching_split() would pick up
the wrong downlink and hence fail to erase the pending split it should
have erased.

I believe this bug is only latent whenever full_page_writes = on,
because in that situation the first touch of any index page after a
checkpoint will rewrite the whole page, and so we'll never be looking
at an index page state newer than the WAL record. That explains why
no one has tripped over it before.

The particular case we are looking at in Panel_pkey seems to require
some additional assumptions to explain the state of the index, but
I've got no doubt this is the core of the problem.

Since we're not going to support full_page_writes = off in 8.1.*,
there's no need for a back-patched fix, but I'll see about making it
safer in HEAD.

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Brant 2006-04-13 15:49:52 Permission denied on fsync / Win32 (was right sibling is not next child)
Previous Message Tom Lane 2006-04-13 00:30:55 Re: right sibling is not next child