Re: 8.1 index corruption woes

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 8.1 index corruption woes
Date: 2008-07-11 18:02:23
Message-ID: 20080711180223.GI4110@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alvaro Herrera wrote:
> Tom Lane wrote:
> > Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > > We've detected what I think is some sort of index corruption in 8.1.
> > > The server is running 8.1.11, so AFAICT the problem with truncated pages
> > > in vacuum is already patched and accounted for (i.e. we reindexed, and a
> > > bit later the problem presented itself again). There haven't been any
> > > relevant fixes after that AFAICT.
> >
> > > What we see is that after a bit of updating the index, it starts having
> > > tuples that poing to heap entries which are marked unused.
> >
> > Do you actually see any observed problem, or is this conclusion based
> > entirely on your pg_filedump analysis?
>
> Well, yeah, this all started because the guys started getting weird
> results in queries, and found out that disabling index scans returned
> different results.

I neglected to mention that further analysis of the failed index scans
showed that index entries were pointing to heap tuples with completely
different data.

I was tracking one of those, when I noticed that in the same index page
was an entry pointing to a heap entry marked as unused.

> > I suspect a problem with your analysis script, although a quick scan
> > of the code didn't find an issue.

I don't want to discard this idea, because we're getting a very
unusually high number of bogus entries. However, they are all (or a
very high percentage of them) the very first entry on each index page.
I want to confirm that the leftmost on a leaf btree page is a valid
item, and not something like the lower bound value? (I think we only
store high bounds on internal pages, not leaf pages, but I'm not sure).

> > Another point to keep in mind, if you are trying to analyze files
> > belonging to a live database, is that what you can see in the filesystem
> > may not be the "current" contents of every page. For typical access
> > patterns it'd be unsurprising for the visible index pages to lag behind
> > those of the heap, since they'd be "hotter" and tend to stay in shared
> > buffers longer.

It was confirmed that the servers are not live, and a checkpoint has
been executed by the recovery code.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-07-11 19:17:08 Re: 8.1 index corruption woes
Previous Message Alvaro Herrera 2008-07-11 17:06:49 _bt_delitems: change before WAL?