Quick Links

Re: Corrupted data, best course of repair?

From:	Sean Chittenden <sean(at)gigave(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-admin(at)postgresql(dot)org
Subject:	Re: Corrupted data, best course of repair?
Date:	2005-08-22 17:46:06
Message-ID:	20050822174606.GA46390@sean.gigave.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-admin

> They run with fsync off AND they like to toggle the power switch at
> random? I'd suggest finding other employment --- they couldn't
> possibly be paying you enough to justify cleaning up after stupidity
> as gross as that.

Colo-by-windows. If there weren't DBAs with Win32 admin tendencies,
I'd be out of work. :)

> Anyway, the errors appear to indicate that there are pages in the
> database with LSN (last WAL location) larger than the actual current
> end of WAL. The difference is pretty large though --- at least 85MB
> of WAL seems to have gone missing. My first thought was a corrupted
> LSN field. But seeing that there are at least two such pages, and
> given the antics you describe above, what seems more likely is that
> the LSNs were correct when written. I think some page of WAL never
> made it to disk during a period of heavy updates that was terminated
> by a power cycle, and during replay we stopped at the first point
> where the WAL data was detectably corrupt, and so a big chunk of WAL
> never got replayed. Which of course means there's probably a lot of
> stuff that needs to be fixed and did not get fixed, but in
> particular our idea of the current end-of-WAL address is a lot less
> than it should be. If you have the server log from just after the
> last postmaster restart, looking at what terminated the replay might
> confirm this.

Peachy.

> You could get the DB to stop complaining by doing a pg_resetxlog to
> push the WAL start address above the largest "flush request"
> mentioned in any of the messages. But my guess is that you'll find
> a lot of internal corruption after you do it. Going back to the
> dump might be a saner way to proceed.

Tons of corruption and a backup that's a few weeks old. *grin* The
most recent dump seems to have all of the data, but some rows are
there in duplicate. Thanks for the input. -sc

--
Sean Chittenden

In response to

Re: Corrupted data, best course of repair? at 2005-08-22 15:00:14 from Tom Lane

Browse pgsql-admin by date

	From	Date	Subject
Next Message	Chris Travers	2005-08-22 17:47:38	Re: connect to postgres from shell scripts
Previous Message	jose fuenmayor	2005-08-22 17:37:34	Indexes (Disk space)