Re: Database corruption?

From: "Mikheev, Vadim" <vmikheev(at)SECTORBASE(dot)COM>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "'pgsql-hackers(at)postgresql(dot)org'" <pgsql-hackers(at)postgresql(dot)org>
Cc: Alvaro Herrera <alvherre(at)atentus(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Database corruption?
Date: 2001-10-23 22:52:30
Message-ID: 3705826352029646A3E91C53F7189E325183E5@sectorbase2.sectorbase.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

> >> Um, Vadim? Still of the opinion that elog(STOP) is a good
> >> idea here? That's two people now for whom that decision has
> >> turned localized corruption into complete database failure.
> >> I don't think it's a good tradeoff.
>
> > One is able to use pg_resetxlog so I don't see point in
> > removing elog(STOP) there. What do you think?
>
> Well, pg_resetxlog would get around the symptom, but at the cost of
> possibly losing updates that are further along in the xlog than the
> update for the corrupted page. (I'm assuming that the problem here
> is a page with a corrupt LSN.) I think it's better to treat flush
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
On restart, entire content of all modified after last checkpoint pages
should be restored from WAL. In Denis case it looks like newly allocated
for update page was somehow corrupted before heapam.c:2235 (7.1.2 src)
and so there was no XLOG_HEAP_INIT_PAGE flag in WAL record => page
content was not initialized on restart. Denis reported system crash -
very likely due to memory problem.

> request past end of log as a DEBUG or NOTICE condition and keep going.
> Sure, it indicates badness somewhere, but we should try to have some
> robustness in the face of that badness. I do not see any reason why
> XLOG has to declare defeat and go home because of this condition.

Ok - what about setting some flag there on restart and abort restart
after all records from WAL applied? So DBA will have choice either
to run pg_resetxlog after that and try to dump data or restore from
old backup. I still object just NOTICE there - easy to miss it. And
in normal processing mode I'd leave elog(STOP) there.

Vadim
P.S. Further discussions will be in hackers-list, sorry.

Browse pgsql-general by date

  From Date Subject
Next Message Martín Marqués 2001-10-23 22:54:15 poor with mirrors
Previous Message Markus Meyer 2001-10-23 22:49:17 Re: Case problem

Browse pgsql-hackers by date

  From Date Subject
Next Message bpalmer 2001-10-23 23:03:46 Re: autoconf taking forever?
Previous Message Bruce Momjian 2001-10-23 21:25:42 Re: [GENERAL] To Postgres Devs : Wouldn't changing the selectlimit