Re: What exactly is postgres doing during INSERT/UPDATE ?

From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Luke Koops <luke(dot)koops(at)entrust(dot)com>, Joseph S <jks(at)selectacast(dot)net>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: What exactly is postgres doing during INSERT/UPDATE ?
Date: 2009-08-30 17:36:19
Message-ID: 4A9AB893.9040402@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 08/30/2009 11:40 AM, Merlin Moncure wrote:
> For random writes, raid 5 has to write a minimum of two drives, the
> data being written and parity. Raid 10 also has to write two drives
> minimum. A lot of people think parity is a big deal in terms of raid
> 5 performance penalty, but I don't -- relative to the what's going on
> in the drive, xor calculation costs (one of the fastest operations in
> computing) are basically zero, and off-lined if you have a hardware
> raid controller.
>
> I bet part of the problem with raid 5 is actually contention. since
> your write to a stripe can conflict with other writes to a different
> stripe. The other problem with raid 5 that I see is that you don't
> get very much extra protection -- it's pretty scary doing a rebuild
> even with a hot spare (and then you should probably be doing raid 6).
> On read performance RAID 10 wins all day long because more drives can
> be involved.
>

In real life, with real life writes (i.e. not sequential from the start
of the disk to the end of the disk), where the stripes on the disk being
written are not already in RAM (to allow for XOR to be cheap), RAID 5 is
horrible. I still recall naively playing with software RAID 5 on a three
disk system and finding write performance to be 20% - 50% less than a
single drive on its own.

People need to realize that the cost of maintaining parity is not the
XOR itself - XOR is cheap - the cost is having knowledge of all drives
in the stripe in order to write the parity. This implies it is already
in cache (requires a very large cache, or a very localized load such
that the load all fits in cache), or it requires 1 or more reads before
2 or more writes. Latency is a killer here - latency is already the
slowest part of the disk, so to effectively multiply latency x 2 has a
huge impact.

I will never use RAID 5 again unless I have a huge memory backed cache
for it to cache writes against. By huge, I mean something approximately
the size of the data normally read and written. Having 1 Gbytes of RAM
dedicated to RAID 5 for a 1 Tbyte drive may not be enough.

RAID 1+0 on the other hand, has never disappointed me yet. Disks are
cheap, and paying x2 for single disk redundancy is an acceptable price.

Cheers,
mark

--
Mark Mielke<mark(at)mielke(dot)cc>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Joseph S 2009-08-30 20:01:49 Re: What exactly is postgres doing during INSERT/UPDATE ?
Previous Message Greg Stark 2009-08-30 15:52:07 Re: What exactly is postgres doing during INSERT/UPDATE ?