Re: Reliability recommendations

From: "Luke Lonergan" <llonergan(at)greenplum(dot)com>
To: "Dan Gorman" <dgorman(at)hi5(dot)com>, "Mark Kirkwood" <markir(at)paradise(dot)net(dot)nz>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Craig A(dot) James" <cjames(at)modgraph-usa(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Reliability recommendations
Date: 2006-02-25 04:41:52
Message-ID: C0252010.1DC7A%llonergan@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Dan,

On 2/24/06 4:47 PM, "Dan Gorman" <dgorman(at)hi5(dot)com> wrote:

> Was that sequential reads? If so, yeah you'll get 110MB/s? How big
> was the datafile size? 8MB? Yeah, you'll get 110MB/s. 2GB? No, they
> can't sustain that. There are so many details missing from this test
> that it's hard to have any context around it :)
>
> I was getting about 40-50MB/s on a PV with 14 disks on a RAID10 in
> real world usage. (random IO and fully saturating a Dell 1850 with 4
> concurrent threads (to peg the cpu on selects) and raw data files)

OK, how about some proof?

In a synthetic test that writes 32GB of sequential 8k pages on a machine
with 16GB of RAM:
========================= Write test results ==============================
time bash -c "dd if=/dev/zero of=/dbfast1/llonergan/bigfile bs=8k
count=2000000 && sync" &
time bash -c "dd if=/dev/zero of=/dbfast3/llonergan/bigfile bs=8k
count=2000000 && sync" &

2000000records in
2000000records out
2000000records in
2000000records out

real 1m0.046s
user 0m0.270s
sys 0m30.008s

real 1m0.047s
user 0m0.287s
sys 0m30.675s

So that's 32,000 MB written in 60.05 seconds, which is 533MB/s sustained
with two threads.

Now to read the same files in parallel:
========================= Read test results ==============================
sync
time dd of=/dev/null if=/dbfast1/llonergan/bigfile bs=8k &
time dd of=/dev/null if=/dbfast3/llonergan/bigfile bs=8k &

2000000records in
2000000records out

real 0m39.849s
user 0m0.282s
sys 0m22.294s
2000000records in
2000000records out

real 0m40.410s
user 0m0.251s
sys 0m22.515s

And that's 32,000MB in 40.4 seconds, or 792MB/s sustained from disk (not
memory).

These are each RAID5 arrays of 8 internal SATA disks on 3Ware HW RAID
controllers.

Now for real usage, let's run a simple sequential scan query on 123,434 MB
of data in a single table on 4 of these machines in parallel. All tables
are distributed evenly by Bizgres MPP over all 8 filesystems:

============= Bizgres MPP sequential scan results =========================

[llonergan(at)salerno]$ !psql
psql -p 9999 -U mppdemo1 demo
Welcome to psql 8.1.1 (server 8.1.3), the PostgreSQL interactive terminal.

Type: \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit

demo=# \timing
Timing is on.
demo=# select version();

version
----------------------------------------------------------------------------
----------------------------------------------------------------------------
-----
PostgreSQL 8.1.3 (Bizgres MPP 2.1) on x86_64-unknown-linux-gnu, compiled by
GCC gcc (GCC) 3.4.4 20050721 (Red Hat 3.4.4-2) compiled on Feb 23 2006
11:34:06
(1 row)

Time: 0.570 ms
demo=# select relname,8*relpages/128 as MB from pg_class order by relpages
desc limit 6;
relname | mb
---------------------------------------
lineitem | 123434
orders | 24907
partsupp | 14785
part | 3997
customer | 3293
supplier | 202
(6 rows)

Time: 1.824 ms
demo=# select count(*) from lineitem;
count
-----------
600037902
(1 row)

Time: 60300.960 ms

So that's 123,434 MB of data scanned in 60.3 seconds, or 2,047 MB/s on 4
machines, which uses 512MB/s of disk bandwidth on each machine.

Now let's do a query that uses a this big table (a two way join) using all 4
machines:
============= Bizgres MPP Query results =========================
demo=# select
demo-# sum(l_extendedprice* (1 - l_discount)) as revenue
demo-# from
demo-# lineitem,
demo-# part
demo-# where
demo-# (
demo(# p_partkey = l_partkey
demo(# and p_brand = 'Brand#42'
demo(# and p_container in ('SM CASE', 'SM BOX', 'SM PACK',
'SM PKG')
demo(# and l_quantity >= 7 and l_quantity <= 7 10
demo(# and p_size between 1 and 5
demo(# and l_shipmode in ('AIR', 'AIR REG')
demo(# and l_shipinstruct = 'DELIVER IN PERSON'
demo(# )
demo-# or
demo-# (
demo(# p_partkey = l_partkey
demo(# and p_brand = 'Brand#15'
demo(# and p_container in ('MED BAG', 'MED BOX', 'MED PKG',
'MED PACK')
demo(# and l_quantity >= 14 and l_quantity <= 14 10
demo(# and p_size between 1 and 10
demo(# and l_shipmode in ('AIR', 'AIR REG')
demo(# and l_shipinstruct = 'DELIVER IN PERSON'
demo(# )
demo-# or
demo-# (
demo(# p_partkey = l_partkey
demo(# and p_brand = 'Brand#53'
demo(# and p_container in ('LG CASE', 'LG BOX', 'LG PACK',
'LG PKG')
demo(# and l_quantity >= 22 and l_quantity <= 22 10
demo(# and p_size between 1 and 15
demo(# and l_shipmode in ('AIR', 'AIR REG')
demo(# and l_shipinstruct = 'DELIVER IN PERSON'
demo(# );
revenue
----------------
356492404.3164
(1 row)

Time: 114908.149 ms

And now a 6-way join among 4 tables in this same schema:

demo=# SELECT
demo-#
s.s_acctbal,s.s_name,n.n_name,p.p_partkey,p.p_mfgr,s.s_address,s.s_phone,s.s
_comment
demo-# FROM
demo-# supplier s,partsupp ps,nation n,region r,
demo-# part p, (
demo(# SELECT p_partkey, min(ps_supplycost) as
min_ps_cost from part, partsupp ,
demo(# supplier,nation, region
demo(# WHERE
demo(# p_partkey=ps_partkey
demo(# and s_suppkey = ps_suppkey
demo(# and s_nationkey = n_nationkey
demo(# and n_regionkey = r_regionkey
demo(# and r_name = 'EUROPE'
demo(# GROUP BY
demo(# p_partkey
demo(# ) g
demo-# WHERE
demo-# p.p_partkey = ps.ps_partkey
demo-# and g.p_partkey = p.p_partkey
demo-# and g. min_ps_cost = ps.ps_supplycost
demo-# and s.s_suppkey = ps.ps_suppkey
demo-# and p.p_size = 15
demo-# and p.p_type like '%BRASS'
demo-# and s.s_nationkey = n.n_nationkey
demo-# and n.n_regionkey = r.r_regionkey
demo-# and r.r_name = 'EUROPE'
demo-# ORDER BY
demo-# s. s_acctbal desc,n.n_name,s.s_name,p.p_partkey
demo-# LIMIT 100;
s_acctbal | s_name | n_name |
p_partkey | p_mfgr |
s_address | s_phone |
s_comment

----------------------------------------------------------------------
----------------------------------
------------------------------------------------------------------------
--------------------------------------
-------------------------------------------
9999.70 | Supplier#000239544 | UNITED KINGDOM |
6739531 | Manufacturer#4 | 1UCMu
3TLyUThghoeZ8arg6cV3Mr | 33-509-584-9496 | carefully ironic
asymptotes cajole quickly. slyly silent a
ccounts sleep. fl
...
...
9975.53 | Supplier#000310136 | ROMANIA |
10810115 | Manufacturer#5 | VNWON
A5Sr B | 29-977-903-6199 | pending deposits
wake permanently; final accounts sleep ab
out the pending deposits.
(100 rows)

Time: 424981.813 ms

- Luke

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Mark Kirkwood 2006-02-25 06:10:55 Re: Reliability recommendations
Previous Message Bruce Momjian 2006-02-25 03:42:39 Re: