Re: PostgreSQL, OLAP, and Large Clusters

From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Ryan Kelly <rpkelly22(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: PostgreSQL, OLAP, and Large Clusters
Date: 2012-09-28 00:03:03
Message-ID: CAOR=d=2qZyRAddH=K3sd6EBjiLbaqrLya5-J5wgzBiHOK2dRCA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Sep 27, 2012 at 12:50 PM, Ryan Kelly <rpkelly22(at)gmail(dot)com> wrote:
> On Wed, Sep 26, 2012 at 03:18:16PM -0600, Scott Marlowe wrote:
>> On Wed, Sep 26, 2012 at 5:50 AM, Ryan Kelly <rpkelly22(at)gmail(dot)com> wrote:
>> > Hi:
>> >
>> > The size of our database is growing rather rapidly. We're concerned
>> > about how well Postgres will scale for OLAP-style queries over terabytes
>> > of data. Googling around doesn't yield great results for vanilla
>> > Postgres in this application, but generally links to other software like
>> > Greenplum, Netezza, and Aster Data (some of which are based off of
>> > Postgres). Too, there are solutions like Stado. But I'm concerned about
>> > the amount of effort to use such solutions and what we would have to
>> > give up feature-wise.
>>
>> If you want fastish OLAP on postgres you need to do several things.
>>
>> 1: Throw very fast disk arrays at it. Lots of spinners in a linux SW
>> RAID-10 or RAID-0 if your data is easily replaceable work wonders
>> here.
>> 2: Throw lots of memory at it. Memory is pretty cheap. 256G is not
>> unusual for OLAP machines
>> 3: Throw fast CPUs at it. Faster CPUs, especially fewer faster cores,
>> are often helpful.
> What do you mean by "fewer faster cores"? Wouldn't "more faster cores"
> be better?

If you can have say 32 opteron cores at 2.2GHz each, or 8 xeon cores
at 3.3GHz each for about the same money, get the 8 faster xeon cores,
because under postgresql you get one core per connection. No built in
parallelism to use greater number of cores.

Also on machines with 2 or 4 sockets there are overhead costs for
accessing different memory banks, so if you're never gonna have more
than a handful of users / queries running at once, you're usually
better of with a single socket fast CPU with say 8 cores.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Maxim Boguk 2012-09-28 00:49:19 Question about ip4r contrib and PostgreSQL 9.2
Previous Message Ondrej Ivanič 2012-09-27 22:48:16 Re: PostgreSQL, OLAP, and Large Clusters