Skip site navigation (1) Skip section navigation (2)

Re: database corruption

From: Chris Travers <chris(at)travelamericas(dot)com>
To: Ian Westmacott <ianw(at)intellivid(dot)com>, pgsql-admin(at)postgresql(dot)org
Subject: Re: database corruption
Date: 2005-04-16 01:29:13
Message-ID: 42606A69.9010102@travelamericas.com (view raw or flat)
Thread:
Lists: pgsql-admin
Hi Ian;

I think it is important to figure out why this is happening.  I would 
not want to run any production databases on systems that were failing 
like this.

I am trying to figure out what are the likely causes of the errors...

1)  Any other computers suffer random application crashes, power downs, 
etc. in your building?
2)  I take it there are no Raid controllers involved?
3)  RAM is non-ECC?
4)  Are the systems on UPS's?

If I could make a wild (and probably wrong) guess, I would wonder if 
something external to the system (like electrical supply) was 
introducing glitches into memory, causing bad data to be written.  I am 
only mentioning it because I have implicated electrical supply in other 
cases where rare computer failurres weer affecting many systems...

Ian Westmacott wrote:

>For several weeks now we have been experiencing fairly
>severe database corruption upon clean reboot.  It is very
>repeatable, and the corruption is of the following forms:
>
>ERROR:  could not access status of transaction foo
>DETAIL:  could not open file "bar": No such file or directory
>
>ERROR:  invalid page header in block foo of relation "bar"
>
>ERROR:  uninitialized page in block foo of relation "bar"
>
>
>At first, we believed this was related to XFS, and have
>been pursuing investigations along those lines.  However,
>we have now experienced the exact same problem with JFS.
>
>Here are some details:
>
>- Postgres 7.4.2
>- 2.6.6 kernel.org kernel
>- dedicated database partition
>- repeatable with XFS and JFS (have not seen on ext3)
>- repeatable with and without Linux software RAID 0
>- repeatable with IDE and SATA
>- repeatable with and without fsync, and with fdatasync
>- repeatable on multiple systems
>
>
>I have two questions:
>
>- any known reason why this might be occurring?  (we must
>  have something wrong, for this high rate of severe
>  error).
>
>- if I don't care about losing data, and am not interested
>  in trying to recover anything, how can I arrange for
>  Postgres to proceed normally?  I know about
>  zero_damaged_pages, but this doesn't help with missing
>  transaction files and such.  Is there any way to get
>  Postgres to chuck anything bad and proceed?
>
>Thanks,
>
>	--Ian
>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 2: you can get off all lists at once with the unregister command
>    (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>
>
>  
>


In response to

Responses

pgsql-admin by date

Next:From: Ian WestmacottDate: 2005-04-16 03:39:26
Subject: Re: database corruption
Previous:From: Chris HooverDate: 2005-04-15 21:49:07
Subject: Re: Help installing 8.0.2 rpms on RH 3.0

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group