Re: Hardware/OS recommendations for large databases (

From: "Luke Lonergan" <llonergan(at)greenplum(dot)com>
To: "Stephan Szabo" <sszabo(at)megazone(dot)bigpanda(dot)com>, "Postgresql Performance" <pgsql-performance(at)postgresql(dot)org>, "David Lang" <dlang(at)invendra(dot)net>
Subject: Re: Hardware/OS recommendations for large databases (
Date: 2005-11-27 19:31:24
Message-ID: BFAF498C.14869%llonergan@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Stephan,

On 11/27/05 7:48 AM, "Stephan Szabo" <sszabo(at)megazone(dot)bigpanda(dot)com> wrote:

> On Sun, 27 Nov 2005, Luke Lonergan wrote:
>
>> Has anyone done the math.on the original post? 5TB takes how long to
>> scan once? If you want to wait less than a couple of days just for a
>> seq scan, you'd better be in the multi-gb per second range.
>
> Err, I get about 31 megabytes/second to do 5TB in 170,000 seconds. I think
> perhaps you were exaggerating a bit or adding additional overhead not
> obvious from the above. ;)

Thanks - the calculator on my blackberry was broken ;-)

> At 1 gigabyte per second, 1 terrabyte should take about 1000 seconds
> (between 16 and 17 minutes). The impressive 3.2 gigabytes per second
> listed before (if it actually scans consistently at that rate), puts it at
> a little over 5 minutes I believe for 1, so about 26 for 5 terrabytes.
> The 200 megabyte per second number puts it about 7 hours for 5
> terrabytes AFAICS.

7 hours, days, same thing ;-)

On the reality of sustained scan rates like that:

We're getting 2.5GB/s sustained on a 2 year old machine with 16 hosts and 96
disks. We run them in RAID0, which is only OK because MPP has built-in host
to host mirroring for fault management.

We just purchased a 4-way cluster with 8 drives each using the 3Ware 9550SX.
Our thought was to try the simplest approach first, which is a single RAID5,
which gets us 7 drives worth of capacity and performance. As I posted
earlier, we get about 400MB/s seq scan rate on the RAID, but the Postgres
8.0 current scan rate limit is 64% of 400MB/s or 256MB/s per host. The 8.1
mods (thanks Qingqing and Tom!) may increase that significantly toward the
400 max - we've already merged the 8.1 codebase into MPP so we'll also
feature the same enhancements.

Our next approach is to run these machines in a split RAID0 configuration,
or RAID0 on 4 and 4 drives. We then run an MPP "segment instance" bound to
each CPU and I/O channel. At that point, we'll have all 8 drives of
performance and capacity per host and we should get 333MB/s with current MPP
and perhaps over 400MB/s with MPP/8.1. That would get us up to the 3.2GB/s
for 8 hosts.

Even better, all operators are executed on all CPUs for each query, so
sorting, hashing, agg, etc etc are run on all CPUs in the cluster.

- Luke

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Ron 2005-11-27 20:25:36 Re: Hardware/OS recommendations for large databases
Previous Message Luke Lonergan 2005-11-27 19:11:53 Re: Hardware/OS recommendations for large databases