Re: Database corruption help

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: John Lister <john(dot)lister-ps(at)kickstone(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Database corruption help
Date: 2009-02-14 08:26:11
Message-ID: 49968023.8030404@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers

Tom Lane wrote:
> Aside from the "how did this happen" puzzle, the real point of any
> investigation of course ought to be whether we can make heap_page_prune
> more robust. At the very least it's undesirable to be leaving the page
> in a state where VACUUM FULL will decide it can't shrink.

I'm as puzzled as you are on how it happened.

The fundamental problem here is that we have HOT-updated tuples that are
missing the root tuple. heap_prune_chain doesn't know how to remove the
dead tuples from such chains, and neither does vacuum. What's worse is
that there's no index pointer to the live tuple in the chain, and even
VACUUM FULL doesn't fix that.

Could we modify heap_page_prune so that it detects such orphaned HOT
tuples, and clears the HOT_UPDATED flag for them? That would at least
let you recover. I'm pretty wary of trying to "fix" things after
corruption in general since you can't be sure what's correct and what's
not, but clearing the HOT_UPDATED flag seems safe and more likely to
help than hurt. Clearing the flag would let the tuple become indexed
again in a VACUUM FULL or REINDEX, so it would fix the inconsistency
that otherwise a sequential scan can see the tuple but an index scan can
not. It would also allow the next heap_page_prune operation to remove
the remaining dead tuples in the chain.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Simon Riggs 2009-02-15 10:18:10 Re: Database corruption help
Previous Message Scott Marlowe 2009-02-14 02:25:17 Re: Vacuum wait time problem

Browse pgsql-hackers by date

  From Date Subject
Next Message Zdenek Kotala 2009-02-14 09:32:41 Re: [patch] fix for regression tests (locale cs_CZ)
Previous Message KaiGai Kohei 2009-02-14 05:46:49 Re: Updates of SE-PostgreSQL 8.4devel patches (r1530)