Re: strange nbtree corruption report

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: strange nbtree corruption report
Date: 2011-11-22 17:23:17
Message-ID: 1321981982-sup-2149@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Excerpts from Tom Lane's message of mar nov 22 01:14:33 -0300 2011:
> Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:

> > ERROR: left link changed unexpectedly in block 3378 of index "index_name"
> > CONTEXT: automatic vacuum of table "table_name"
>
> > This was reported not once, but several dozens of times, by each new
> > autovacuum worker that tried to vacuum the table.
>
> > As far as I can see, there is just no way for this to happen ... much
> > less happen repeatedly.
>
> It's not hard to believe that that would happen repeatedly given a
> corrupted set of sibling links, eg deletable page A links left to page
> B, which links right to C, which links right to A. The question is how
> the index got into such a state. A dropped update during a page split
> would explain it (ie, B used to be A's left sibling, then at some point
> B got split into B and C, but A's left-link never got updated on disk).
> I wonder how reliable their disk+filesystem is ...

Well, there are no other signs of random data corruption, such as toast
pointers getting corrupted which is the number one symptom showing up
when underlying storage is flaky. However, it may be possible that
there was a transient storage problem which only affected this one page;
if this persisted in the way you describe, it might well explain these
symptoms.

Another thing I noticed is that there was corruption in heap pages (not
the same server, though; it was a different Londiste slave). This was
even more strange; the pages would be completely fine, except the first
six words corresponding to the page header; they would be all zeros.
When filled with valid-looking data (mostly I copied the bytes from
neighbor pages), the rest of the page would decode fine.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-11-22 17:35:01 Re: Storing hot members of PGPROC out of the band
Previous Message Jeff Davis 2011-11-22 16:53:41 Re: Singleton range constructors versus functional coercion notation