Skip site navigation (1) Skip section navigation (2)

Re: Hardware/OS recommendations for large databases (

From: David Lang <dlang(at)invendra(dot)net>
To: Brendan Duddridge <brendan(at)clickspace(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Hardware/OS recommendations for large databases (
Date: 2005-11-28 04:01:05
Message-ID: Pine.LNX.4.62.0511271940400.2807@qnivq.ynat.uz (view raw or flat)
Thread:
Lists: pgsql-performance
On Mon, 28 Nov 2005, Brendan Duddridge wrote:

> Hi David,
>
> Thanks for your reply. So how is that different than something like Slony2 or 
> pgcluster with multi-master replication? Is it similar technology? We're 
> currently looking for a good clustering solution that will work on our Apple 
> Xserves and Xserve RAIDs.

MPP doesn't just split up the data, it splits up the processing as well, 
so if you have a 5 machine cluster, each machine holds 1/5 of your data 
(plus a backup for one of the other machines) and when you do a query MPP 
slices and dices the query to send a subset of the query to each machine, 
it then gets the responses from all the machines and combines them

if you ahve to do a full table scan for example, wach machine would only 
have to go through 20% of the data

a Slony of pgcluster setup has each machine with a full copy of all the 
data, only one machine can work on a given query at a time, and if you 
have to do a full table scan one machine needs to read 100% of the data.

in many ways this is the holy grail of databases. almost all other areas 
of computing can now be scaled by throwing more machines at the problem in 
a cluster, with each machine just working on it's piece of the problem, 
but databases have had serious trouble doing the same and so have been 
ruled by the 'big monster machine'. Oracle has been selling Oracle Rac for 
a few years, and reports from people who have used it range drasticly 
(from it works great, to it's a total disaster), in part depending on the 
types of queries that have been made.

Greenplum thinks that they have licked the problems for the more general 
case (and that commodity networks are now fast enough to match disk speeds 
in processing the data) if they are right then when they hit full release 
with the new version they should be cracking a lot of the 
price/performance records on the big database benchmarks (TPC and 
similar), and if their pricing is reasonable, they may be breaking them by 
an order of magnatude or more (it's not unusual for the top machines to 
spend more then $1,000,000 on just their disk arrays for those 
systems, MPP could conceivably put togeather a cluster of $5K machines 
that runs rings around them (and probably will for at least some of the 
subtests, the big question is if they can sweep the board and take the top 
spots outright)

they have more details (and marketing stuff) on their site at 
http://www.greenplum.com/prod_deepgreen_cluster.html

don't get me wrong, I am very impressed with their stuff, but (haveing 
ranted a little here on the list about them) I think MPP and it's 
performace is a bit off topic for the postgres performance list (at least 
until the postgres project itself starts implementing similar features :-)

David Lang

> Thanks,
>
> ____________________________________________________________________
> Brendan Duddridge | CTO | 403-277-5591 x24 |  brendan(at)clickspace(dot)com
>
> ClickSpace Interactive Inc.
> Suite L100, 239 - 10th Ave. SE
> Calgary, AB  T2G 0V9
>
> http://www.clickspace.com
>
> On Nov 27, 2005, at 8:09 PM, David Lang wrote:
>
>> On Mon, 28 Nov 2005, Brendan Duddridge wrote:
>> 
>>> Forgive my ignorance, but what is MPP? Is that part of Bizgres? Is it 
>>> possible to upgrade from Postgres 8.1 to Bizgres?
>> 
>> MPP is the Greenplum propriatary extention to postgres that spreads the 
>> data over multiple machines, (raid, but with entire machines not just 
>> drives, complete with data replication within the cluster to survive a 
>> machine failing)
>> 
>> for some types of queries they can definantly scale lineraly with the 
>> number of machines (other queries are far more difficult and the overhead 
>> of coordinating the machines shows more. this is one of the key things that 
>> the new version they recently announced the beta for is supposed to be 
>> drasticly improving)
>> 
>> early in the year when I first looked at them their prices were exorbadent, 
>> but Luke says I'm wildly mistake on their current prices so call them for 
>> details
>> 
>> it uses the same interfaces as postgres so it should be a drop in 
>> replacement to replace a single server with a cluster.
>> 
>> it's facinating technology to read about.
>> 
>> I seem to remember reading that one of the other postgres companies is also 
>> producing a clustered version of postgres, but I don't remember who and 
>> know nothing about them.
>> 
>> David Lang
>> 
>
>

In response to

pgsql-performance by date

Next:From: Merlin MoncureDate: 2005-11-28 13:31:06
Subject: Re: Newbie question: ultra fast count(*)
Previous:From: David LangDate: 2005-11-28 03:09:04
Subject: Re: Hardware/OS recommendations for large databases (

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group