Re: PANIC: failed to re-find parent key in "100924" for split pages 1606/1673

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: valiouk(at)yahoo(dot)co(dot)uk, pgsql-bugs(at)postgresql(dot)org
Subject: Re: PANIC: failed to re-find parent key in "100924" for split pages 1606/1673
Date: 2009-01-08 20:38:49
Message-ID: 27285.1231447129@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> But with a down server, you just force people to do pg_resetxlog, which
> loses both the corruption (probably) and real, useful data (likely) and
> *then* they bring up the server. I don't see why we should force people
> to take a manual action and lose data to bring up the server.

That's all fine, but simply reducing the message level from PANIC to LOG
remains an utterly unacceptable "solution". What will happen is that
the server will start, the DBA will go back to sleep after ignoring
(most likely, never even reading) the log message, and the corruption
will get worse. The potential consequences of corruption in a pg_class
index, for example, are just horrid. Frankly I'd rather "rm -rf $PGDATA"
and force someone to go back to their last backup than let them continue
to run with a database that is known to be broken and the system didn't
do anything more to warn them than emit a LOG message someplace.

(No, I'm not seriously proposing that as a recovery technique. But it's
no more irresponsible than ignoring a corruption condition.)

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Bruce Momjian 2009-01-08 21:53:46 Re: BUG #4509: array_cat's null behaviour is inconsistent
Previous Message Simon Riggs 2009-01-08 20:23:44 Re: PANIC: failed to re-find parent key in "100924" for split pages 1606/1673