db recovery after raid5 failure

From: qcor(at)vp(dot)pl
To: pgsql-admin(at)postgresql(dot)org
Subject: db recovery after raid5 failure
Date: 2010-06-16 12:29:53
Message-ID: Q5239608-f07f1390cabb96d99e13d7ef02d1fe13@pmq4.m5r2.onet.test.onet.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hello

I have serious problems recovering our db after recent raid5 failure.
Long story short - no recent dumps, some missing files (like pg_control).

long version of the story:
our raid failed.. badly. I was able to recover most of the files but some
(like pg_control) are missing (possibly more, I dont know that).
After creating an img of damged raid I copied all recovered files to new
system with same version of PG installed (8.2).
I copied whole DATA folder and tried to run pg service (win xp).
First error was sth about missing pg_control file. I googled solution
involving unsing "pg_resetxlog -f ..\data". That went good (I guess). File
was created.
Second error was sth about 'access denied' to pg_control. It occured that
copying files messed up files ownership so I granted 'rwx' permission to all
users. That went good (I guess).

PG service started at last... but...
When I log in using pgadmin I can see 0 (zero) databases and 0 registered
roles. But when I hit 'refresh' after few second I can see SOME of my old
databases back on list. (registered roles as well). Then after few more
seconds of hitting 'refresh' more and more databases are back on the list.
BUT...
There are no tables inside :( All databases contain only 4 pg_xxxx tables.

Out of pure curiosity I tried to recover one of the databases using some
very old dump using "psql dbname <dbfile" and got tons of errors saying
"create table blabla... fields etc" - > 'this relation already exist".
So... is it still there? why cant I see it? any way to recover it?

one more thing I noticed in pgadmin: owner of database "unknown (oid 17004)"
but in "registered roles" I can see "name: tom, oid:17004... etc"
So it looks like there is registered owner with the same oid but for some
reason pg cant find a link.

oh, I almost forgot, pg_log is full of
" xlog flush request (number) is not satisfied --- flushed only to
(another_number)
writing block 1 of (another_number)
multiple failures --- write error may be permanent"

Anyone can help? 2 days of using google didnt help much :( YOU are my last
hope...

qc

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Tom Lane 2010-06-16 15:02:55 Re: valgrind error
Previous Message Guy Deleeuw 2010-06-16 08:27:48 valgrind error