Re: right sibling is not next child

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: right sibling is not next child
Date: 2006-04-12 20:28:41
Message-ID: 3355.1144873721@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

"Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov> writes:
> One thing that seems strange to me is that the original crash on
> Thursday failed on Panel_pkey, but my "vacuum analyze verbose" on a copy
> of the crashed database failed on MaintCode /
> pg_statistic_relid_att_index.

I can't find anything particularly wrong with
pg_statistic_relid_att_index. There are two index entries for many of
the key values, but that's unsurprising if a database-wide vacuum
analyze had been done or in progress --- just-obsoleted pg_statistic
rows wouldn't have been vacuumed away yet. I'm inclined to think that
this problem is a side-effect of the fact that you had to use
pg_resetxlog; there was probably an update to pg_statistic that got
lost. If you want to continue experimenting with this database,
I'd suggest TRUNCATE'ing pg_statistic and rebuilding it via another
VACUUM ANALYZE run.

> I'll send over pg_statistic_relid_att_index and Panel_pkey. Showing
> the keys to Panel_pkey is no problem.

My little index printing tool shows these entries in Panel_pkey at btree
level zero:

...
40 2006 RES032706 CH0327RES high key on page 606
40 2006 RES032706 RES032706 383 11
40 2006 RES032706 RES032706 high key on page 608
40 2006 RES040306 CC0403RES 507 14
40 2006 RES040306 CCC0403RES 551 1
40 2006 RES040306 CCC0403RES high key on page 601
40 2006 RES040306 RES040306 500 1
40 2006 RES040306 RES040306 high key on page 640
40 2006 RES040306 RES040306 high key on page 636
40 2006 RES040306 RES040306 high key on page 635
41 0001 2000POOL 0001 159 3
41 0001 2000POOL 0002 159 4
41 0001 2000POOL 0003 159 5
...

(The first four columns are the key values of this index; the last two
are the pointed-to heap tuple's page/line location. High keys are
printed after any data keys on the page.) The down-links at level one
look like:

...
40 2006 RES032706 CCC0327RES 606 1
40 2006 RES032706 CH0327RES 608 1
40 2006 RES032706 RES032706 601 1
40 2006 RES040306 CCC0403RES 640 1
40 2006 RES040306 RES040306 635 1
40 2006 RES040306 RES040306 636 1
40 2006 RES040306 RES040306 635 1
40 2006 RES040306 RES040306 629 1
41 0405 0405 0105 166 1
...

This is fairly interesting because we've got three pages with the same
boundary key. The bogus entry for page 635 has been inserted where
you'd expect it to get inserted if the insertion were being done on the
basis of key comparison. (We usually insert a new entry in front of any
ones with the same key.) But we never do insertions on non-leaf pages
by key comparison! Upper-level entries are only made during page
splits, by _bt_insert_parent, and that always works by locating the
down-link to the page just split and adding the new entry just after it.

One thing I realize after seeing this is that the index corruption might
be of long standing: the way that the btree search algorithms work, no
search would ever have descended directly to 635 or 636, but instead to
640 and then traverse right from there. So you'd not have noticed any
malfunction, until the time came to try to delete the page. That means
we shouldn't assume that the problem was created recently.

Now that we know the problem is associated with keys '40 2006 RES040306
RES040306', can you check your logs and see if anything interesting
happened around the time those keys were being inserted? (Note: if
Panel_pkey really is a primary key, ie unique, it might seem odd for
there to be several pages worth of identical entries --- this would
imply that the same record had been updated several hundred times
between two vacuums.)

I still kinda like the theory that the extra key got in there because
btree_xlog_cleanup fired inappropriately ... mainly because I can't
think of any other theories ... but I'm still baffled about the details.
Anyone have any ideas?

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Bruce Momjian 2006-04-12 21:46:39 Re: contrib/intarray/_int_gist.c
Previous Message Peter Brant 2006-04-12 16:40:27 Re: right sibling is not next child