Skip site navigation (1) Skip section navigation (2)

Re: IDE and write cache

From: "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
To: Mark Lubratt <mark(dot)lubratt(at)indeq(dot)com>
Cc: <pgsql-admin(at)postgresql(dot)org>
Subject: Re: IDE and write cache
Date: 2004-02-11 15:48:48
Message-ID: Pine.LNX.4.33.0402110839430.32376-100000@css120.ihs.com (view raw or flat)
Thread:
Lists: pgsql-admin
On Wed, 11 Feb 2004, Mark Lubratt wrote:

> Interesting discussions on IDE drives and their write caches.
> 
> I have a question...
> 
> You mentioned that you'd see the problem during a large number of 
> concurrent transactions.  My question is, is this a necessary condition 
> for the database crashing when the plug was pulled, or did you need use 
> a large number of concurrent transactions to "guarantee" that when you 
> pulled the plug, that it would be at an inopportune time?  In other 
> words, is an IDE drive still "more" susceptible to a power outage 
> problem even under light load?

Basically, if the data has been written to WAL, and an fsync issued, and 
the drive has it in cache, but hasn't written it to the platters, and you 
lose power, the database will likely be corrupted and will refuse to 
startup when the machine boots up.  Also, of course, some data will be 
lost that was supposedly committed in a transaction.

So, yeah, the reason for having hundreds of open transactions is that it 
makes the window of opportunity for a lying drive to corrupt the database.

So, yes, even under light load, you could have a corrupted database if you 
lose power while a write is happening.  Of course, if the database is 
sitting idle at the time of the power outage then you're ok.

-------------------------------------------------------------------------

Funny little story.  We had an electrician working above our main power 
switch (the big box that switches us from line power, to UPS, to the 
diesel generator) and said electrician clipped a piece of wire that fell 
into the switch, shorting it out, and taking down our entire hosting 
center (think $1,000 a minute...)

As I was walking down a hallway, one of the winders / fox pro guys asked 
me if my machine would come back up when the power came on (it runs on 
dial 36 gig 10krpm SCSI drives under an LSI megaraid with battery backed 
cache, and I've tested it pulling the plug before going production.)  I'd 
been bragging to him about the power plug pull tests it had passed, so of 
course, he's just teasing me.

I told him that as long as the power cut hadn't spiked the box and fried 
anything we were gold.

An hour later when they got the switch fixed and everything came back up, 
my machine came up fine, but the NAS machines that provide the web storage 
behind it (not the database, that's local) took about 10 minutes to fsck 
or mount or whatever it is they do.

So I'm walking by foxpro guy's desk and I casually say "Well, looks like 
my box had some problems coming back up."  He smiles, thinking he's got 
me, the bragging postgresql guy, by the short ones.  "yeah, seems it boots 
faster than the network storage it sits on.  Just CTRL-ALT-DEL and it was 
up and running fine."  He laughed along with me.  I trust Postgresql.  On 
SCSI or RAID with battery backed cache.


In response to

pgsql-admin by date

Next:From: scott.marloweDate: 2004-02-11 15:56:28
Subject: Re: hanging for 30sec when checkpointing
Previous:From: Veera SivakumarDate: 2004-02-11 15:12:53
Subject: No space left on device

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group