Re: PostgreSQL, OLAP, and Large Clusters

From: Ryan Kelly <rpkelly22(at)gmail(dot)com>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: PostgreSQL, OLAP, and Large Clusters
Date: 2012-09-27 18:50:03
Message-ID: 20120927185003.GD25296@llserver.lakeliving.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Sep 26, 2012 at 03:18:16PM -0600, Scott Marlowe wrote:
> On Wed, Sep 26, 2012 at 5:50 AM, Ryan Kelly <rpkelly22(at)gmail(dot)com> wrote:
> > Hi:
> >
> > The size of our database is growing rather rapidly. We're concerned
> > about how well Postgres will scale for OLAP-style queries over terabytes
> > of data. Googling around doesn't yield great results for vanilla
> > Postgres in this application, but generally links to other software like
> > Greenplum, Netezza, and Aster Data (some of which are based off of
> > Postgres). Too, there are solutions like Stado. But I'm concerned about
> > the amount of effort to use such solutions and what we would have to
> > give up feature-wise.
>
> If you want fastish OLAP on postgres you need to do several things.
>
> 1: Throw very fast disk arrays at it. Lots of spinners in a linux SW
> RAID-10 or RAID-0 if your data is easily replaceable work wonders
> here.
> 2: Throw lots of memory at it. Memory is pretty cheap. 256G is not
> unusual for OLAP machines
> 3: Throw fast CPUs at it. Faster CPUs, especially fewer faster cores,
> are often helpful.
What do you mean by "fewer faster cores"? Wouldn't "more faster cores"
be better?

> Applied in that order you can get some pretty impressive results.
>
> A lot of OLAP stuff needs to read hundreds of gigs at a time from the
> drive array. An array of 32 15kRPM drives, each reading at ~100MB/s
> or better can flood your PCI bus at 3200MB/s for reads. Note that
> most RAID controllers aren't as fast for sequential reads on large
> drive arrays. Although a battery backed cache can GREATLY improved
> parallel write speed, it doesn't really make a big difference for big
> sequential stuff and usually gets in the way here.
>
> Memory to cache as much as possible and allow all your queries to do
> hash joins etc in memory (crank up work_mem as needed, but be careful
> not to use all your memory up.)
>
> Lastly once you've thrown lots of IO and memory at it, a faster CPU
> can make a bit of a difference too. But honestly I'd rather have a
> dual core 2GHz CPU on top of 32 spinners with 256G than a 3.6GHz 8
> core CPU on top of 4 drives and 32G of RAM.
All of this seems like great advice.

Thanks,
-Ryan Kelly

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ryan Kelly 2012-09-27 18:54:46 Re: PostgreSQL, OLAP, and Large Clusters
Previous Message Ryan Kelly 2012-09-27 18:47:27 Re: PostgreSQL, OLAP, and Large Clusters