3rd time is a charm.....right sibling is not next child crash.

From: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: 3rd time is a charm.....right sibling is not next child crash.
Date: 2010-06-08 13:26:25
Message-ID: 920117.63233.qm@web65511.mail.ac4.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Not looking for help...just putting some data out there.

2 previous crashes caused by corrupt slony indexes

http://archives.postgresql.org/pgsql-general/2010-02/msg00022.php

http://archives.postgresql.org/pgsql-general/2009-12/msg01172.php

New one yesterday.

Jun 7 15:05:01 db-1 postgres[9334]: [ID 748848 local0.crit] [3989781-1] 2010-06-07 15:05:01.087 CDT 9334PANIC: right sibling 169 of block 168 is not next child of 249 in index "sl_seqlog_idx"

We are on the eve of switching off our SAN to some direct attached storage and upgrading postgres and slony in the process this weekend....so any thoughts that it might be hardware, driver or even postgres/slony should be alleviated by the fact that everything is changing.

That being said, the fact that each time this has happened, it has been a slony index that has been corrupt, I find it 'odd'. While I can't imagine a bug in slony corrupting postgres indexes...and I can't imagine a bug in postgres corrupting only slony indexes, I don't really know what to think. Just putting this out there in case anyone has similar issues or can use this data in some meaningful way.

Stack trace looks similar to last time.

Program terminated with signal 6, Aborted.
#0 0xfecba227 in _lwp_kill () from /lib/libc.so.1
(gdb) bt
#0 0xfecba227 in _lwp_kill () from /lib/libc.so.1
#1 0xfecb598f in thr_kill () from /lib/libc.so.1
#2 0xfec61ed3 in raise () from /lib/libc.so.1
#3 0xfec41d0d in abort () from /lib/libc.so.1
#4 0x0821b8a6 in errfinish (dummy=0) at elog.c:471
#5 0x0821c74b in elog_finish (elevel=22, fmt=0x82b7780 "right sibling %u of block %u is not next child of %u in index \"%s\"") at elog.c:964
#6 0x0809e1a0 in _bt_pagedel (rel=0x867bcd8, buf=139905, stack=0x86b3768, vacuum_full=0 '\0') at nbtpage.c:1141
#7 0x0809f835 in btvacuumscan (info=0x8043f70, stats=0x86b5c30, callback=0, callback_state=0x0, cycleid=29488) at nbtree.c:936
#8 0x0809fc65 in btbulkdelete (fcinfo=0x0) at nbtree.c:547
#9 0x0821f424 in FunctionCall4 (flinfo=0x0, arg1=0, arg2=0, arg3=0, arg4=0) at fmgr.c:1215
#10 0x0809a89f in index_bulk_delete (info=0x8043f70, stats=0x0, callback=0x812ffc8 <lazy_tid_reaped>, callback_state=0x86b5818) at indexam.c:573
#11 0x0812ff54 in lazy_vacuum_index (indrel=0x867bcd8, stats=0x86b5b70, vacrelstats=0x86b5818) at vacuumlazy.c:660
#12 0x0813055a in lazy_vacuum_rel (onerel=0x867b7f8, vacstmt=0x86659b8) at vacuumlazy.c:487
#13 0x0812e910 in vacuum_rel (relid=140925368, vacstmt=0x86659b8, expected_relkind=114 'r') at vacuum.c:1107
#14 0x0812f95a in vacuum (vacstmt=0x86659b8, relids=0x8665bc0) at vacuum.c:400
#15 0x08186e16 in AutoVacMain (argc=0, argv=0x0) at autovacuum.c:914
#16 0x08187278 in autovac_start () at autovacuum.c:178
#17 0x0818bfed in ServerLoop () at postmaster.c:1252
#18 0x0818d16d in PostmasterMain (argc=3, argv=0x833adc8) at postmaster.c:966
#19 0x08152cce in main (argc=3, argv=0x833adc8) at main.c:188
(gdb)

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Peter Hunsberger 2010-06-08 14:23:13 Re: Cognitive dissonance
Previous Message Craig Ringer 2010-06-08 13:00:33 Re: >>relation with OID 1211822032 does not exist

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Sabino Mullane 2010-06-08 13:37:15 Re: [BUGS] Invalid YAML output from EXPLAIN
Previous Message Stephen Frost 2010-06-08 12:58:02 Re: Idea for getting rid of VACUUM FREEZE on cold pages