Re: Huge Data sets, simple queries

From: "Luke Lonergan" <LLonergan(at)greenplum(dot)com>
To: "hubert depesz lubaczewski" <depesz(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Huge Data sets, simple queries
Date: 2006-01-29 18:44:08
Message-ID: 3E37B936B592014B978C4415F90D662D023F28D9@MI8NYCMAIL06.Mi8.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Depesz,

> [mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of
> hubert depesz lubaczewski
> Sent: Sunday, January 29, 2006 3:25 AM
>
> hmm .. do i understand correctly that you're suggesting that
> using raid 10 and/or hardware raid adapter might hurt disc
> subsystem performance? could you elaborate on the reasons,
> please? it's not that i'm against the idea - i'm just curious
> as this is very "against-common-sense". and i always found it
> interesting when somebody states something that uncommon...

See previous postings on this list - often when someone is reporting a
performance problem with large data, the answer comes back that their
I/O setup is not performing well. Most times, people are trusting that
when they buy a hardware RAID adapter and set it up, that the
performance will be what they expect and what is theoretically correct
for the number of disk drives.

In fact, in our testing of various host-based SCSI RAID adapters (LSI,
Dell PERC, Adaptec, HP SmartArray), we find that *all* of them
underperform, most of them severely. Some produce results slower than a
single disk drive. We've found that some external SCSI RAID adapters,
those built into the disk chassis, often perform better. I think this
might be due to the better drivers and perhaps a different marketplace
for the higher end solutions driving performance validation.

The important lesson we've learned is to always test the I/O subsystem
performance - you can do so with a simple test like:
time bash -c "dd if=/dev/zero of=bigfile bs=8k count=4000000 && sync"
time dd if=bigfile of=/dev/null bs=8k

If the answer isn't something close to the theoretical rate, you are
likely limited by your RAID setup. You might be shocked to find a
severe performance problem. If either is true, switching to software
RAID using a simple SCSI adapter will fix the problem.

BTW - we've had very good experiences with the host-based SATA adapters
from 3Ware. The Areca controllers are also respected.

Oh - and about RAID 10 - for large data work it's more often a waste of
disk performance-wise compared to RAID 5 these days. RAID5 will almost
double the performance on a reasonable number of drives.

- Luke

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jeffrey W. Baker 2006-01-29 21:04:01 Re: Huge Data sets, simple queries
Previous Message Michael Stone 2006-01-29 15:43:24 Re: Huge Data sets, simple queries