Re: Fwd: index corruption in PG 8.3.13

From: Nikhil Sontakke <nikhil(dot)sontakke(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fwd: index corruption in PG 8.3.13
Date: 2011-03-11 11:17:58
Message-ID: AANLkTimSXMshgbfe0vc=Z4TG-p31R6MPM_YQBXCCw3qK@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

>>> 1. Somebody inserts a bunch of new tuples into the relation, causing
>>> growth in the index.
>>
>> In case it's not obvious VACUUM FULL would do precisely that.
>
> Oh, I didn't even think about that.  Yeah, that could be it, too.

Thanks a lot Greg and Robert. This theory seems very plausible. VF
must have carried out a rearrangement of heap tuples for compaction
and that might have caused new index entries which might explain the
extension of that many blocks.

> maybe VACUUM FULL - crash before checkpoint - problem with recovery?

Did I mention that an immediate shutdown was thrown into the mix just
after the VF? That is almost close to a crash and that means that the
shared buffers were not written back to the index data file. So that
should also account for all these pages still being zeroed out. So
change the above to:

> VACUUM FULL - immediate shutdown - problem with recovery?

But WAL replay should still have handled this. I would presume even an
immediate shutdown ensures that WAL is flushed to disk properly?

So that means that either there is a corner case bug in VF which adds
incorrect WAL logging in some specific btree layout scenarios or there
was indeed some bit flipping in the WAL, which caused the recovery to
prematurely end during WAL replay. What are the scenarios that you
would think can cause WAL bit flipping?

I was trying to repro this on the setup by repeatedly creating a table
with large inserts, doing lotta deletes, running VF and then issuing
immediate shutdown. However if I try to inspect the index data file at
this point in the test case, it is inconsequential as the file is
largely out of sync since its dirty shared buffers have not been
flushed. That leaves me with the option to restart and check the index
data file again for problems. If we see problems after the restart it
should generally mean WAL logging errors (but we still cannot discount
the bit flipping case I guess).

I guess a perusal of the WAL activity done by VF btree index activity
is the need of the hour..

Regards,
Nikhils

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2011-03-11 11:26:41 Re: B-tree parent pointer and checkpoints
Previous Message Bruce Momjian 2011-03-11 11:13:20 Re: [HACKERS] Couple document fixes