Re: Fwd: index corruption in PG 8.3.13

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Nikhil Sontakke <nikhil(dot)sontakke(at)enterprisedb(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fwd: index corruption in PG 8.3.13
Date: 2011-03-11 13:14:22
Message-ID: AANLkTi=F6XQkVRFpLLO+JwbnDT+=LXj5=hDh=gZHMxmD@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 11, 2011 at 6:17 AM, Nikhil Sontakke
<nikhil(dot)sontakke(at)enterprisedb(dot)com> wrote:
>> VACUUM FULL - immediate shutdown - problem with recovery?

An immediate shutdown == an intentional crash. OK, so you have the
VACUUM FULL and the immediate shutdown just afterward. So we just
need to figure out what happened during recovery.

> But WAL replay should still have handled this. I would presume even an
> immediate shutdown ensures that WAL is flushed to disk properly?

I'm not sure, but I doubt it. If the VACUUM FULL committed, then the
WAL records should be on disk, but if the immediate shutdown happened
while it was still running, then the WAL records might still be in
wal_buffers, in which case I don't think they'll get written out and
thus zero pages in the index are to be expected. Now that doesn't
explain any other corruption in the file, but I believe all-zeroes
pages in a relation are an expected consequence of an unclean
shutdown. But assuming the VF actually committed before the immediate
shutdown, there must be something else going on, since by that point
XLOG should have been flushed.

> So that means that either there is a corner case bug in VF which adds
> incorrect WAL logging in some specific btree layout scenarios or there
> was indeed some bit flipping in the WAL, which caused the recovery to
> prematurely end during WAL replay. What are the scenarios that you
> would think can cause WAL bit flipping?

Some kind of fluke hard drive malfunction, maybe? I know that the
incidence of a hard drive being told to write A and actually writing B
is very low, but it's probably not exactly zero. Do you have the logs
from the recovery following the immediate shutdown? Anything
interesting there?

Or, as you say, there could be a corner-case VF bug.

> I was trying to repro this on the setup by repeatedly creating a table
> with large inserts, doing lotta deletes, running VF and then issuing
> immediate shutdown. However if I try to inspect the index data file at
> this point in the test case, it is inconsequential as the file is
> largely out of sync since its dirty shared buffers have not been
> flushed. That leaves me with the option to restart and check the index
> data file again for problems. If we see problems after the restart it
> should generally mean WAL logging errors (but we still cannot discount
> the bit flipping case I guess).

contrib/pageinspect might help.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2011-03-11 13:14:47 Re: Replication server timeout patch
Previous Message Robert Haas 2011-03-11 13:03:54 Re: [COMMITTERS] pgsql: Document that the parenthesized VACUUM syntax is deprecated, not