Re: Performance while loading data and indexing

From: "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
To: Mats Lofkvist <mal(at)algonet(dot)se>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Performance while loading data and indexing
Date: 2002-09-27 15:16:03
Message-ID: Pine.LNX.4.33.0209270907500.9417-100000@css120.ihs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers pgsql-performance

On 27 Sep 2002, Mats Lofkvist wrote:

> shridhar_daithankar(at)persistent(dot)co(dot)in ("Shridhar Daithankar") writes:
>
> [snip]
> >
> > Couple MB of data per sec. to disk is just not saturating it. It's a RAID 5
> > setup..
> >
>
> RAID5 is not the best for performance, especially write performance.
> If it is software RAID it is even worse :-).

I take exception to this. RAID5 is a great choice for most folks.

1: RAID5 only writes out the parity stripe and data stripe, not all
stripes when writing. So, in an 8 disk RAID5 array, writing to a single
64 k stripe involves one 64k read (parity stripe) and two 64k writes.

On a mirror set, writing to one 64k stripe involves two 64k writes. The
difference isn't that great, and in my testing, a large enough RAID5
provides so much faster read speads by spreading the reads across so many
heads as to more than make up for the slightly slower writes. My testing
has shown that a 4 disk RAID5 can generally run about 85% or more the
speed of a mirror set.

2: Why does EVERYONE have to jump on the bandwagon that software RAID 5
is bad. My workstation running RH 7.2 uses about 1% of the CPU during
very heavy parallel access (i.e. 50 simo pgbenchs) at most. I've seen
many hardware RAID cards that are noticeable slower than my workstation
running software RAID. You do know that hardware RAID is just software
RAID where the processing is done on a seperate CPU on a card, but it's
still software doing the work.

3: We just had a hardware RAID card mark both drives in a mirror set bad.
It wouldn't accept them back, and all the data was gone. poof. That
would never happen in Linux's kernel software RAID, I can always make
Linux take back a "bad" drive.

The only difference between RAID5 with n+1 disks and RAID0 with n disks is
that we have to write a parity stripe in RAID5. It's ability to handle
high parallel load is much better than a RAID1 set, and on average, you
actually write about the same amount with either RAID1 or RAID5.

Don't dog software RAID5, it works and it works well in Linux. Windows,
however, is another issue. There, the software RAID5 is pretty pitiful,
both in terms of performance and maintenance.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Orr, Steve 2002-09-27 15:37:09 pgbench
Previous Message Magnus Naeslund(f) 2002-09-27 15:13:23 How do i make use of listen/notify properly

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2002-09-27 16:51:50 Re: query speed depends on lifetime of frozen db?
Previous Message Andriy Tkachuk 2002-09-27 14:58:05 Re: query speed depends on lifetime of frozen db?

Browse pgsql-performance by date

  From Date Subject
Next Message Florian Weimer 2002-09-27 19:01:38 Re: [GENERAL] Performance while loading data and indexing
Previous Message Mats Lofkvist 2002-09-27 10:49:17 Re: Performance while loading data and indexing