Re: Hardware HD choice...

From: "Scott Carey" <scott(at)richrelevance(dot)com>
To: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
Cc: Lionel <lionel(at)art-informatique(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Hardware HD choice...
Date: 2008-10-24 02:48:34
Message-ID: a1ec7d000810231948w1c436619h307e6ac19b88111b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

If you are doing batch inserts of data, and want to have reporting queries
concurrently running, make sure you have the pg_xlogs on a different disk
than the data/indexes. 2 drives RAID 1 for OS + xlogs works great (and
these can be SAS if you choose, have a separate partition -- ext2 if it is
linux -- for the xlogs. Then you can easily go with storage capacity and
SATA for the main reporting portion. You just don't want the inserts in
batches to slow the whole thing to a crawl due to xlog writes on the same
drive array as the reporting.
However, if you can only get a few disks, it is a lot harder to choose
between one large array and two of them split without experimenting with
both on real data and queries. It is a quick and easy performance win if
you have 6+ disks and do enough writes.

Also, if you intend to have lots of data organized by a time field, and
expect to do the reporting/aggregation queries on subsets of that data
bounded by time, partitioning by time can have huge benefits. Partition by
month, for example, and sequential scans will only flow to the months of
interest if the queries have the right lmits on the date in the where
clause.

Partitioning WILL take more development and tuning time, so don't do it
unless you know you need it... though if the reporting is mostly restricted
to time windows, the impact it has on improving runtimes of aggregation
queries is immense. However, partitioning won't help at all until you have
enough data to justify it.

On Thu, Oct 23, 2008 at 10:16 AM, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>wrote:

> On Thu, Oct 23, 2008 at 10:38 AM, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
> wrote:
> >>
> >> Any other better option that I could ask for ?
> >
> > Yes, more drives. 4 drives in a RAID10 is a good start. If you could
> > get 8 or 12 in one that's even better.
> >
>
> Note that for transactional databases SAS drives are usually
> noticeably better, but for reporting databases, SATA drives are
> generally fine, with 70-80% the sustained transfer rate at less than
> half the cost per megabyte. I'd recommend 8 SATA drives over 4 SAS
> drives for a reporting database. You'll spend about the same on twice
> the number of drives but you'll get much more storage, which is often
> useful when you need to work with large datasets.
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Marlowe 2008-10-24 05:41:49 Re: Hardware HD choice...
Previous Message Guy Rouillier 2008-10-23 23:18:35 Re: Annoying Reply-To