Re: 8.3.5 broken after power fail SOLVED

From: Michael Monnerie <michael(dot)monnerie(at)is(dot)it-management(dot)at>
To: pgsql-admin(at)postgresql(dot)org
Subject: Re: 8.3.5 broken after power fail SOLVED
Date: 2009-02-21 12:58:59
Message-ID: 200902211358.59500@zmi.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Samstag 21 Februar 2009 Scott Marlowe wrote:
> We preach this again and again.  PostgreSQL can only survive a power
> outage type failure ONLY if the hardware / OS / filesystem don't lie
> about fsync.  If they do, all bets are off, and this kind of failure
> means you should really failover to another machine or restore a
> backup.

The shit thing is, I just discussed with the XFS devs last week, whether
it is save to have a virtualization like VMware or XEN, and the answer
was "depends on the hypervisor". I had such an issue with VMware 2 years
ago, and now with XEN, so I would say they are not save. But there must
be something you can configure in order not to have such drastic errors
on power fail. It's just nobody seems to know (or want to tell) how to
do that. At least, not to me ;-)

> It's why you have to do possibly destructive tests to see if your
> server stands at least some chance of surviving this kind of failure,
> log shipping for recovery, and / or replication of another form
> (slony etc...) to have a reliable server.

As I need another Postgres setup with a server syncing dbmail to
another, I guess I'll do that with WAL, so at least then I can recover
to that latest entry.

> The recommendations for recovery of data are just that, recovery
> oriented.  They can't fix a broken database at that point.  You need
> to take it offline after this kind of failure if you can't trust your
> hardware.
>
> Usually when it finds something wrong it just won't start up.

The problem was I wasn't working this week, and did just a basic check
if everything is up again. There were e-mails arriving, so I thought
it's OK. I was very pissed when some days later I found strange things
happening, and then to see that a table was broken and ate nearly all e-
mails. If at least Postgres would have whined and stopped working...

I know it's not Postgres' fault to have fsync messed up, but at least
error recovery should have found the problem, latest at the moment the
first transaction touched the problematic table. Instead of throwing the
data effectively to /dev/null :-(

mfg zmi
--
// Michael Monnerie, Ing.BSc ----- http://it-management.at
// Tel: 0660 / 415 65 31 .network.your.ideas.
// PGP Key: "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38 500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net Key-ID: 1C1209B4

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Jan-Peter Seifert 2009-02-21 13:45:24 Re: very, very slow performance
Previous Message Michael Monnerie 2009-02-21 12:51:05 Re: 8.3.5 broken after power fail