Re: Corrupted database's files (linux RAID5 + PostgreSQL 8.3.0)

From: Sim Zacks <sim(at)compulab(dot)co(dot)il>
To: Peter Petrov <peter(at)demabg(dot)com>
Subject: Re: Corrupted database's files (linux RAID5 + PostgreSQL 8.3.0)
Date: 2008-05-21 12:36:22
Message-ID: 48341746.3080902@compulab.co.il
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

If you have a backup, the easiest way would be to restore it. There is
also a way to run the database logfile into the database from a point in
time (ie. from the time f last backup) so that you can get your data.
I've never actually seen it work though.

Peter Petrov wrote:
> Hi,
>
> Today one of the disk was marked as as failed .... and now some files
> are corrupted.
> I've decided to copy the pgsqldata directory and try to fix PG_VERSION
> (see below for information - what PostgreSQL don't like) files ... and
> see if the database will come up.
> During copying files and etc. I'll be open for any other idea how to
> deal with the problem ;)
>
> PostgreSQL's log offer me to run initdb (HINT message from LOG file) -
> what will happen if then I try to copy the rest ot the structure into
> the newly created database cluster ?
>
> linux (Slackware 12.0.0), software RAID5 (partition based) + PostgreSQL
> 8.3.0:
>
> Here's what happen (from dmesg):
>
> ---------------------------------------
> # uname -a
> Linux xeonito 2.6.21.5 #3 SMP Tue Oct 2 16:20:48 EEST 2007 i686 Intel(R)
> Xeon(R) CPU E5335 @ 2.00GHz GenuineIntel GNU/Linux
>
> ---------------------------------------
> # dmesg
> sd 0:0:3:0: SCSI error: return code = 0x08000002
> sdd: Current: sense key=0x4
> ASC=0x44 ASCQ=0x0
> Info fld=0x0
> end_request: I/O error, dev sdd, sector 159620863
> sd 0:0:3:0: SCSI error: return code = 0x08000002
> sdd: Current: sense key=0x4
> ASC=0x44 ASCQ=0x0
> Info fld=0x0
> end_request: I/O error, dev sdd, sector 159617119
> raid5: Disk failure on sdd1, disabling device. Operation continuing on 4
> devices
> ......
>
> RAID5 conf printout:
> --- rd:5 wd:4
> disk 0, o:1, dev:sdb1
> disk 1, o:1, dev:sdc1
> disk 2, o:0, dev:sdd1
> disk 3, o:1, dev:sde1
> disk 4, o:1, dev:sdf1
> RAID5 conf printout:
> --- rd:5 wd:4
> disk 0, o:1, dev:sdb1
> disk 1, o:1, dev:sdc1
> disk 3, o:1, dev:sde1
> disk 4, o:1, dev:sdf1
>
> ---------------------------------------
>
> # cat /proc/mdstat
> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5]
> [raid4] [multipath] [faulty]
> md1 : active raid5 sdb1[0] sdf1[4] sde1[3] sdd1[5](F) sdc1[1]
> 585924608 blocks level 5, 8192k chunk, algorithm 2 [5/4] [UU_UU]
>
> md0 : active raid5 sdb2[0] sdf2[4] sde2[3] sdd2[5](F) sdc2[1]
> 390053888 blocks level 5, 1024k chunk, algorithm 2 [5/4] [UU_UU]
>
> unused devices: <none>
>
> ---------------------------------------
>
> And here's what the partitions look like:
>
> # fdisk -l /dev/sdb
>
> Disk /dev/sdb: 249.8 GB, 249865175040 bytes
> 255 heads, 63 sectors/track, 30377 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
>
> Device Boot Start End Blocks Id System
> /dev/sdb1 1 18237 146488671 83 Linux
> /dev/sdb2 18238 30377 97514550 83 Linux
>
> ---------------------------------------
> Kernel parameters:
>
> echo 4200000000 > /proc/sys/kernel/shmmax
> echo 4200000000 > /proc/sys/kernel/shmall
> sysctl -w vm.overcommit_memory=2
>
> echo 8192 > /sys/block/md0/md/stripe_cache_size
> echo 8192 > /sys/block/md1/md/stripe_cache_size
>
> ---------------------------------------
>
>
> Both md0 and md1 are used from PostgreSQL - initially it was not design
> to use the whole disk sdb-sdf, but due to size requirement I join also
> the other unused space to be used by PostgreSQL.
>
>
> And here's the Postgre's log (FATAL message is coming when I try to
> connect to the database, of course this is the case for the most
> interesting database ... some other small databases are working fine):
>
> LOG: received smart shutdown request
> LOG: autovacuum launcher shutting down
> LOG: shutting down
> LOG: database system is shut down
> LOG: could not create IPv6 socket: Address family not supported by
> protocol
> LOG: database system was shut down at 2008-05-20 17:54:17 EEST
> LOG: autovacuum launcher started
> LOG: database system is ready to accept connections
> FATAL: "base/16399" is not a valid data directory
> DETAIL: File "base/16399/PG_VERSION" does not contain valid data.
> HINT: You might need to initdb.
>
> Of course base/16399/PG_VERSION contains something strange not the
> version information:
>
> # cat base/16399/PG_VERSION
> X
>
>
> ---------------------------------------
>
>
>
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkg0F0YACgkQjDX6szCBa+r5wwCg5Dzms7G3ipmVaoBbCZd+jPp8
TmIAnRrehvG1m+wvERsZ8J8Xw8v9scO5
=5AgU
-----END PGP SIGNATURE-----

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Roberts, Jon 2008-05-21 13:09:43 Re: best er modeling tool for postgreSQL
Previous Message Sim Zacks 2008-05-21 12:23:32 bytea case sensitivity