Re: Indexes getting corrupted.

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Bruno G(dot) Albuquerque" <balbuquerque(at)dba(dot)com(dot)br>, Scott Marlowe <smarlowe(at)g2switchworks(dot)com>, pgsql-admin(at)postgresql(dot)org, David Parker <dparker(at)tazznetworks(dot)com>
Subject: Re: Indexes getting corrupted.
Date: 2005-06-17 21:37:19
Message-ID: 1119044239.3645.234.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Thu, 2005-06-16 at 08:25 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > I'm suspicious of a more subtle intermittent error.
>
> Yeah, I am too, but so far none of the reporters have been cooperative
> about providing more information :-(
>
> > We have no
> > information about what the magic values are, only that they are not
> > correct. Should we increase the information returned for that error?
>
> I think the proposed patch is a waste of time. What I am hoping to get
> from people is a dump of the whole first page of the corrupted index
> (via pg_filedump, or even good ol' od). That might give us some idea of
> what we are dealing with --- localized corruption in a basically good
> metapage, or wholesale replacement of the page with some other page (and
> if so what), or maybe it is a hardware fault after all. You can't draw
> those sorts of conclusions from one or two words, but with a whole page
> to look at you have a shot at telling the difference.

Sorry, let me explain. I immediately assumed that all problem reporters
had not provided further information because they could not, not because
they would not.

The patch is trivial and your suggested methods of looking into this are
much preferable. However, that relies on somebody being able to locate
the metapage, even assuming they don't overwrite the whole thing and
continue. The bug is intermittent and so re-running will allow work to
continue, so it would seem that is exactly what they do.

My patch would allow some more information to be retrieved in the
meantime, while we hope for somebody to upload a damaged metapage. If
the patch is not sufficient, then I'd suggest that the metapage be
dumped to log so it can be more easily provided for inspection.

In general, such matters are usually hardware faults. But all reporters
of this error in 8.0 have only reported index errors and no others,
which leads me to suspect a software related cause.

Best Regards, Simon Riggs

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Anjan Dave 2005-06-17 21:44:56 Re: startup subprocess hangs
Previous Message Andrew Janian 2005-06-17 20:36:45 Re: startup subprocess hangs