Skip site navigation (1) Skip section navigation (2)

Re: What exactly is postgres doing during INSERT/UPDATE ?

From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Luke Koops <luke(dot)koops(at)entrust(dot)com>, Joseph S <jks(at)selectacast(dot)net>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: What exactly is postgres doing during INSERT/UPDATE ?
Date: 2009-08-30 17:36:19
Message-ID: 4A9AB893.9040402@mark.mielke.cc (view raw or flat)
Thread:
Lists: pgsql-performance
On 08/30/2009 11:40 AM, Merlin Moncure wrote:
> For random writes, raid 5 has to write a minimum of two drives, the
> data being written and parity.  Raid 10 also has to write two drives
> minimum.  A lot of people think parity is a big deal in terms of raid
> 5 performance penalty, but I don't -- relative to the what's going on
> in the drive, xor calculation costs (one of the fastest operations in
> computing) are basically zero, and off-lined if you have a hardware
> raid controller.
>
> I bet part of the problem with raid 5 is actually contention. since
> your write to a stripe can conflict with other writes to a different
> stripe.  The other problem with raid 5 that I see is that you don't
> get very much extra protection -- it's pretty scary doing a rebuild
> even with a hot spare (and then you should probably be doing raid 6).
> On read performance RAID 10 wins all day long because more drives can
> be involved.
>    

In real life, with real life writes (i.e. not sequential from the start 
of the disk to the end of the disk), where the stripes on the disk being 
written are not already in RAM (to allow for XOR to be cheap), RAID 5 is 
horrible. I still recall naively playing with software RAID 5 on a three 
disk system and finding write performance to be 20% - 50% less than a 
single drive on its own.

People need to realize that the cost of maintaining parity is not the 
XOR itself - XOR is cheap - the cost is having knowledge of all drives 
in the stripe in order to write the parity. This implies it is already 
in cache (requires a very large cache, or a very localized load such 
that the load all fits in cache), or it requires 1 or more reads before 
2 or more writes. Latency is a killer here - latency is already the 
slowest part of the disk, so to effectively multiply latency x 2 has a 
huge impact.

I will never use RAID 5 again unless I have a huge memory backed cache 
for it to cache writes against. By huge, I mean something approximately 
the size of the data normally read and written. Having 1 Gbytes of RAM 
dedicated to RAID 5 for a 1 Tbyte drive may not be enough.

RAID 1+0 on the other hand, has never disappointed me yet. Disks are 
cheap, and paying x2 for single disk redundancy is an acceptable price.

Cheers,
mark

-- 
Mark Mielke<mark(at)mielke(dot)cc>


In response to

Responses

pgsql-performance by date

Next:From: Joseph SDate: 2009-08-30 20:01:49
Subject: Re: What exactly is postgres doing during INSERT/UPDATE ?
Previous:From: Greg StarkDate: 2009-08-30 15:52:07
Subject: Re: What exactly is postgres doing during INSERT/UPDATE ?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group