Re: SAN vs Internal Disks

From: "Harsh Azad" <harsh(dot)azad(at)gmail(dot)com>
To: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: SAN vs Internal Disks
Date: 2007-09-06 17:03:49
Message-ID: a199704d0709061003u71d536fbh9d71ed095dfe16b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thanks Scott, we have now requested IBM/EMC to provide test machines.
Interestingly since you mentioned the importance of Raid controllers and the
drivers; we are planning to use Cent OS 5 for hosting the DB.

Firstly, I could only find postgres 8.1.x RPM for CentOS 5, could not find
any RPM for 8.2.4. Is there any 8.2.4 RPM for CentOS 5?

Secondly, would investing into Redhat enterprise edition give any
performance advantage? I know all the SAN boxes are only certified on RHEL
and not CentOS. Or since CentOS is similar to RHEL it would be fine?

Regards,
Harsh

On 9/6/07, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> wrote:
>
> On 9/6/07, Harsh Azad <harsh(dot)azad(at)gmail(dot)com> wrote:
> > Hi,
> >
> > We are currently running our DB on a DualCore, Dual Proc 3.Ghz Xeon, 8GB
> > RAM, 4x SAS 146 GB 15K RPM on RAID 5.
> >
> > The current data size is about 50GB, but we want to purchase the
> hardware to
> > scale to about 1TB as we think our business will need to support that
> much
> > soon.
> > - Currently we have a 80% read and 20% write percentages.
>
> For this type load, you should be running on RAID10 not RAID5. Or, if
> you must use RAID 5, use more disks and have a battery backed caching
> RAID controller known to perform well with RAID5 and large arrays.
>
> > - Currently with this configuration the Database is showing signs of
> > over-loading.
>
> On I/O or CPU? If you're running out of CPU, then look to increasing
> CPU horsepower and tuning postgresql.
> If I/O then you need to look into a faster I/O subsystem.
>
> > - Auto-vaccum, etc run on this database, vaccum full runs nightly.
>
> Generally speaking, if you need to run vacuum fulls, you're doing
> something wrong. Is there a reason you're running vacuum full or is
> this just precautionary. vacuum full can bloat your indexes, so you
> shouldn't run it regularly. reindexing might be a better choice if
> you do need to regularly shrink your db. The better option is to
> monitor your fsm usage and adjust fsm settings / autovacuum settings
> as necessary.
>
> > - Currently CPU loads are about 20%, memory utilization is full (but
> this
> > is also due to linux caching disk blocks) and IO waits are frequent.
> > - We have a load of about 400 queries per second
>
> What does vmstat et. al. say about CPU versus I/O wait?
>
> > Now we are considering to purchase our own servers and in the process
> are
> > facing the usual dilemmas. First I'll list out what machine we have
> decided
> > to use:
> > 2x Quad Xeon 2.4 Ghz (4-way only 2 populated right now)
> > 32 GB RAM
> > OS Only storage - 2x SCSI 146 GB 15k RPM on RAID-1
> > (Data Storage mentioned below)
> >
> > We have already decided to split our database into 3 machines on the
> basis
> > on disjoint sets of data. So we will be purchasing three of these boxes.
> >
> > HELP 1: Does something look wrong with above configuration, I know there
> > will be small differences b/w opetron/xeon. But do you think there is
> > something against going for 2.4Ghz Quad Xeons (clovertown i think)?
>
> Look like good machines, plenty fo memory.
>
> > HELP 2: The main confusion is with regards to Data Storage. We have the
> > option of going for:
> >
> > A: IBM N-3700 SAN Box, having 12x FC 300GB disks, Partitioned into 3
> disks
> > into RAID-4 for WAL/backup, and 9 disks on RAID-DP for data, 2 hot
> spare. We
> > are also considering similar solution from EMC - CX310C.
> >
> > B: Go for Internal of DAS based storage. Here for each server we should
> be
> > able to have: 2x disks on RAID-1 for logs, 6x disks on RAID-10 for
> > tablespace1 and 6x disks on RAID-10 for tablespace2. Or maybe 12x disks
> on
> > RAID-10 single table-space.
> >
> > What do I think? Well..
> > SAN wins on manageability, replication (say to a DR site), backup,
> etc...
> > DAS wins on cost
>
> The problem with SAN is that it's apparently very easy to build a big
> expensive system that performs poorly. We've seen reports of such
> here on the lists a few times. I would definitely demand an
> evaluation period from your supplier to make sure it performs well if
> you go SAN.
>
> > But for a moment keeping these aside, i wanted to discuss, purely on
> > performance side which one is a winner? It feels like internal-disks
> will
> > perform better, but need to understand a rough magnitude of difference
> in
> > performance to see if its worth loosing the manageability features.
>
> That really really really depends. The quality of RAID controllers
> for either setup is very important, as is the driver support, etc...
> All things being even, I'd lean towards the local storage.
>
> > Also if we choose to go with DAS, what would be the best tool to do
> async
> > replication to DR site and maybe even as a extra plus a second read-only
> DB
> > server to distribute select loads.
>
> Look at slony, or PITR with continuous recovery. Of those two, I've
> only used Slony in production, and I was very happy with it's
> performance, and it was very easy to write a bash script to monitor
> the replication for failures.
>

--
Harsh Azad
=======================
Harsh(dot)Azad(at)gmail(dot)com

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Joshua D. Drake 2007-09-06 17:12:22 Re: SAN vs Internal Disks
Previous Message Harsh Azad 2007-09-06 16:58:45 Re: SAN vs Internal Disks