Re: right sibling is not next child

From: "Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: right sibling is not next child
Date: 2006-04-12 23:57:59
Message-ID: 443D4DB7020000BE00002D0A@gwmta.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The middle tier transaction log indicates this record was inserted into
the county database at 2006-03-31 21:00:32.94. It would have hit the
central databases sometime thereafter (more or less immediately if all
was well).

The Panel table contains some running statistics which are updated
frequently so a bunch of updates to the same record isn't unexpected.

Sure enough, the log on the 31st looks pretty ugly. A few selections
(this still would have been with full_page_writes = off):

(a bunch of these)
[2006-03-31 13:00:01.720 ] 2636 LOG: could not fsync segment 0 of
relation 1663/16385/1392439: Permission denied
[2006-03-31 13:00:01.720 ] 2636 ERROR: storage sync failed on magnetic
disk: Permission denied

[2006-03-31 14:31:05.314 ] 608 LOG: database system was not properly
shut down; automatic recovery in progress

[2006-03-31 14:50:54.828 ] 4052 WARNING: relation "DbTranImageStatus"
page 130 is uninitialized --- fixing
[2006-03-31 14:50:55.000 ] 4052 PANIC: right sibling is not next child
in "DbTranImageStatus_pkey" (as part of an autovacuum)

[2006-03-31 22:13:39.140 ] 2816 <postgres bigbird 127.0.0.1(1658)>
WARNING: relation "Assess" page 259792 is uninitialized --- fixing
[2006-03-31 22:19:18.484 ] 2816 <postgres bigbird 127.0.0.1(1658)>
WARNING: relation "Charge" page 338120 is uninitialized --- fixing
[2006-03-31 23:16:55.671 ] 2816 <postgres bigbird 127.0.0.1(1658)>
WARNING: relation "Case" page 391222 is uninitialized --- fixing

and finally

[2006-03-31 23:54:15.859 ] 2816 <postgres bigbird 127.0.0.1(1658)>
PANIC: right sibling is not next child in "Panel_pkey"

It looks like this has been happening with the nightly "vacuum analyze"
since then, but nobody noticed until "heap_update_redo: no block" took
things down for the count.

Thanks very much for your efforts. If nothing else, we've learned a
valuable lesson about keeping a better eye on the logs. :-(

Pete

>>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> 04/12/06 10:28 pm >>>
One thing I realize after seeing this is that the index corruption
might
be of long standing: the way that the btree search algorithms work, no
search would ever have descended directly to 635 or 636, but instead
to
640 and then traverse right from there. So you'd not have noticed any
malfunction, until the time came to try to delete the page. That
means
we shouldn't assume that the problem was created recently.

Now that we know the problem is associated with keys '40 2006
RES040306
RES040306', can you check your logs and see if anything interesting
happened around the time those keys were being inserted? (Note: if
Panel_pkey really is a primary key, ie unique, it might seem odd for
there to be several pages worth of identical entries --- this would
imply that the same record had been updated several hundred times
between two vacuums.)

I still kinda like the theory that the extra key got in there because
btree_xlog_cleanup fired inappropriately ... mainly because I can't
think of any other theories ... but I'm still baffled about the
details.
Anyone have any ideas?

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2006-04-13 00:30:55 Re: right sibling is not next child
Previous Message Eric Noriega 2006-04-12 23:49:33 BUG #2391: "Similar to" pattern matching does not operate as documented