Re: OT (slightly) testing for data loss on an SSD drive due to power failure

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: OT (slightly) testing for data loss on an SSD drive due to power failure
Date: 2011-04-23 02:48:04
Message-ID: 4DB23DE4.7000108@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 04/22/2011 10:04 AM, John Rouillard wrote:
> We have a couple of ssd's 2 x 160GB Intel X25-M MLC SATA
> acting as the zil (write journal) and are trying to see if it is safe
> to use for a power fail situation.
>

Well, the quick answer is "no". I've lost several weekends of my life
to recovering information from database stored on those drivers, after
they were corrupted in a crash.

> The testing method is to copy a bunch of files over NFS to the server
> with the zil. When the copy is running along, pull the power to the
> server. The NFS client will stop and if the client got a message that
> block X was written safely to the zil, it will continue writing with
> block x+1. After the server comes backup and and the copies
> resume/finish the files are checksummed. If block X went missing, the
> checksums will fail and we will have our proof.
>

Interestingly, you have reinvented parts of the standard script for
testing for data loss, diskchecker.pl:
http://brad.livejournal.com/2116715.html

You can get a few thousand commits per second using that program, which
is enough to fill the drive buffer such that a power pull should
sometimes lose something. I don't think you can do a proper test here
using NFS; you really need something that is executing fsync calls
directly in the same pattern a database server will.

ZFS is more resilient than most filesystem as far as avoiding file
corruption in this case. But you should still be able to find some
missing transactions that are sitting in the drive cache.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Henry 2011-04-23 15:48:35 Re: not using partial index
Previous Message Claudio Freire 2011-04-22 23:19:19 Re: oom_killer