Re: Hardware/OS recommendations for large databases (

From: "Luke Lonergan" <llonergan(at)greenplum(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Cc: "eng(at)intranet(dot)greenplum(dot)com" <eng(at)intranet(dot)greenplum(dot)com>
Subject: Re: Hardware/OS recommendations for large databases (
Date: 2005-11-28 16:26:21
Message-ID: BFB06FAD.148EC%llonergan@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

The MPP test I ran was with the release version 2.0 of MPP which is based on
Postgres 8.0, the upcoming 2.1 release is based on 8.1, and 8.1 is far
faster at seq scan + agg. 12,937MB were counted in 4.5 seconds, or 2890MB/s
from I/O cache. That's 722MB/s per host, and 360MB/s per Postgres instance,
up from 267 previously with 8.0.3.

I'm going to apply Tom's pre-8.2 seq scan locking optimization and see how
much better we can get!

- Luke

==========================================================
Bizgres MPP CVS tip (2.1 pre), 8 data segments, 1 per CPU
==========================================================

llonergan=# \timing
Timing is on.
llonergan=# explain select count(1) from lineitem;
QUERY PLAN
--------------------------------------------------------------------------
Aggregate (cost=0.01..0.01 rows=1 width=0)
-> Gather Motion (cost=0.01..0.01 rows=1 width=0)
-> Aggregate (cost=0.01..0.01 rows=1 width=0)
-> Seq Scan on lineitem (cost=0.00..0.00 rows=1 width=0)
(4 rows)

Time: 1.464 ms
llonergan=# select count(1) from lineitem;
count
----------
59986052
(1 row)

Time: 4478.563 ms
llonergan=# select count(1) from lineitem;
count
----------
59986052
(1 row)

Time: 4550.917 ms
llonergan=# select count(1) from lineitem;
count
----------
59986052
(1 row)

Time: 4482.261 ms

On 11/24/05 9:16 AM, "Luke Lonergan" <LLonergan(at)greenplum(dot)com> wrote:

> The same 12.9GB distributed across 4 machines using Bizgres MPP fits into
> I/O cache. The interesting result is that the query "select count(1)" is
> limited in speed to 280 MB/s per CPU when run on the lineitem table. So
> when I run it spread over 4 machines, one CPU per machine I get this:
>
> ======================================================
> Bizgres MPP, 4 data segments, 1 per 2 CPUs
> ======================================================
> llonergan=# explain select count(1) from lineitem;
> QUERY PLAN
> ----------------------------------------------------------------------------
> ----------
> Aggregate (cost=582452.00..582452.00 rows=1 width=0)
> -> Gather Motion (cost=582452.00..582452.00 rows=1 width=0)
> -> Aggregate (cost=582452.00..582452.00 rows=1 width=0)
> -> Seq Scan on lineitem (cost=0.00..544945.00 rows=15002800
> width=0)
> (4 rows)
>
> llonergan=# \timing
> Timing is on.
> llonergan=# select count(1) from lineitem;
> count
> ----------
> 59986052
> (1 row)
>
> Time: 12191.435 ms
> llonergan=# select count(1) from lineitem;
> count
> ----------
> 59986052
> (1 row)
>
> Time: 11986.109 ms
> llonergan=# select count(1) from lineitem;
> count
> ----------
> 59986052
> (1 row)
>
> Time: 11448.941 ms
> ======================================================
>
> That's 12,937 MB in 11.45 seconds, or 1,130 MB/s. When you divide out the
> number of Postgres instances (4), that's 283MB/s per Postgres instance.
>
> To verify that this has nothing to do with MPP, I ran it in a special
> internal mode on one instance and got the same result.
>
> So - we should be able to double this rate by running one segment per CPU,
> or two per host:
>
> ======================================================
> Bizgres MPP, 8 data segments, 1 per CPU
> ======================================================
> llonergan=# select count(1) from lineitem;
> count
> ----------
> 59986052
> (1 row)
>
> Time: 6484.594 ms
> llonergan=# select count(1) from lineitem;
> count
> ----------
> 59986052
> (1 row)
>
> Time: 6156.729 ms
> llonergan=# select count(1) from lineitem;
> count
> ----------
> 59986052
> (1 row)
>
> Time: 6063.416 ms
> ======================================================
> That's 12,937 MB in 11.45 seconds, or 2,134 MB/s. When you divide out the
> number of Postgres instances (8), that's 267MB/s per Postgres instance.
>
> So, if you want to "select count(1)", using more CPUs is a good idea! For
> most complex queries, having lots of CPUs + MPP is a good combo.
>
> Here is an example of a sorting plan - this should probably be done with a
> hash aggregation, but using 8 CPUs makes it go 8x faster:
>
> ======================================================
> Bizgres MPP, 8 data segments, 1 per CPU
> ======================================================
> llonergan=# \timing
> Timing is on.
> llonergan=# explain select l_orderkey from lineitem order by l_shipdate,
> l_extendedprice limit 10;
> QUERY PLAN
> ----------------------------------------------------------------------------
> -----
> Limit (cost=0.01..0.02 rows=1 width=24)
> -> Gather Motion (cost=0.01..0.02 rows=1 width=24)
> Merge Key: l_shipdate, l_extendedprice
> -> Limit (cost=0.01..0.02 rows=1 width=24)
> -> Sort (cost=0.01..0.02 rows=1 width=24)
> Sort Key: l_shipdate, l_extendedprice
> -> Seq Scan on lineitem (cost=0.00..0.00 rows=1
> width=24)
> (7 rows)
>
> Time: 0.592 ms
> llonergan=# select l_orderkey from lineitem order by l_shipdate,
> l_extendedprice limit 10;
> l_orderkey
> ------------
> 51829667
> 26601603
> 16579717
> 40046023
> 41707078
> 22880928
> 35584422
> 31272229
> 49914018
> 42309990
> (10 rows)
>
> Time: 93469.443 ms
>
> ======================================================
>
> So that's 60M rows and 12.9GB sorted in 93.5 seconds.
>
> - Luke
>
>

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2005-11-28 18:10:07 Re: Hardware/OS recommendations for large databases (
Previous Message Merlin Moncure 2005-11-28 13:31:06 Re: Newbie question: ultra fast count(*)