Re: Reliability recommendations

From: "Luke Lonergan" <llonergan(at)greenplum(dot)com>
To: "Ron" <rjpeace(at)earthlink(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Reliability recommendations
Date: 2006-02-25 15:49:51
Message-ID: C025BCA0.1DCCD%llonergan@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Ron,

On 2/25/06 3:24 AM, "Ron" <rjpeace(at)earthlink(dot)net> wrote:

>> These are each RAID5 arrays of 8 internal SATA disks on 3Ware HW RAID
>> controllers.
>
> Impressive IO rates. A more detailed HW list would help put them in context.
>
> Which 3Ware? The 9550SX? How much cache on it (AFAIK, the only
> options are 128MB and 256MB?)?
>
> Which HDs?
>
> What CPUs (looks like Opterons, but which flavor?) and mainboard?
>
> What's CPU utilization when hammering the physical IO subsystem this hard?

OK. There are four machines. Each machine has:
Qty 2 of 3Ware 9550SX/PCIX/128MB cache SATAII RAID controllers
Qty 2 of AMD Opteron 250 CPUs (2.4 GHz)
Qty 16 of 1GB PC3200 RAM (16GB total)
Qty 1 of Tyan 2882S Motherboard
Qty 16 Western Digital 400GB Raid Edition 2 SATA disks
Cost: About $10,000 each

They are connected together using a Netgear 48 port Gigabit Ethernet switch
and copper CAT6 cables.
Cost: About $1,500

Total of all machines:
8 CPUs
64GB RAM
64 Disks
20TB storage in RAID5
Total HW cost: About $45,000.

CPU utilization is apparently (adding up usr+system and dividing by real):
Writing at 2,132MB/s: 51%
Reading at 3,168MB/s: 56%

>> revenue
>> ----------------
>> 356492404.3164
>> (1 row)
>>
>> Time: 114908.149 ms
> Hmmm. ~115secs @ ~500MBps => ~57.5GB of data manipulated.

Actually, this query sequential scans all of both lineitem and part, so it¹s
accessing 127.5GB of data, and performing the work of the hash join, in
about double the scan time. That¹s possible because we¹re using all 8 CPUs,
4 network interfaces and the 64 disks while we are performing the query:

Aggregate (cost=3751996.97..3751996.99 rows=1 width=22)
-> Gather Motion (cost=3751996.97..3751996.99 rows=1 width=22)
-> Aggregate (cost=3751996.97..3751996.99 rows=1 width=22)
-> Hash Join (cost=123440.49..3751993.62 rows=1339
width=22)
Hash Cond: ("outer".l_partkey = "inner".p_partkey)
Join Filter: ((("inner".p_brand = 'Brand#42'::bpchar)
AND (("inner".p_container = 'SM CASE'::bpchar) OR ("inner".p_container = 'SM
BOX'::bpchar) OR ("inner".p_container = 'SM PACK'::bpchar) OR
("inner".p_container = 'SM PKG'::bpchar)) AND ("outer".l_quantity >=
7::numeric) AND ("outer".l_quantity <= 17::numeric) AND ("inner".p_size <=
5)) OR (("inner".p_brand = 'Brand#15'::bpchar) AND (("inner".p_container =
'MED BAG'::bpchar) OR ("inner".p_container = 'MED BOX'::bpchar) OR
("inner".p_container = 'MED PKG'::bpchar) OR ("inner".p_container = 'MED
PACK'::bpchar)) AND ("outer".l_quantity >= 14::numeric) AND
("outer".l_quantity <= 24::numeric) AND ("inner".p_size <= 10)) OR
(("inner".p_brand = 'Brand#53'::bpchar) AND (("inner".p_container = 'LG
CASE'::bpchar) OR ("inner".p_container = 'LG BOX'::bpchar) OR
("inner".p_container = 'LG PACK'::bpchar) OR ("inner".p_container = 'LG
PKG'::bpchar)) AND ("outer".l_quantity >= 22::numeric) AND
("outer".l_quantity <= 32::numeric) AND ("inner".p_size <= 15)))
-> Redistribute Motion (cost=0.00..3340198.25
rows=2611796 width=36)
Hash Key: l_partkey
-> Seq Scan on lineitem (cost=0.00..3287962.32
rows=2611796 width=36)
Filter: (((l_shipmode = 'AIR'::bpchar) OR
(l_shipmode = 'AIR REG'::bpchar)) AND (l_shipinstruct = 'DELIVER IN
PERSON'::bpchar))
-> Hash (cost=95213.75..95213.75 rows=2500296
width=36)
-> Seq Scan on part (cost=0.00..95213.75
rows=2500296 width=36)
Filter: (p_size >= 1)
(13 rows)

>> Time: 424981.813 ms
> ...and this implies ~425secs @ ~500MBps => 212.5GB

I've attached this explain plan because it's large. Same thing applies as
in previous, all of the involved tables are scanned, and in this case we've
got all manner of CPU work being performed: sorts, hashes, aggregations,...

> What are the IO rates during these joins?

They burst then disappear during the progression of the queries. These are
now CPU bound because the I/O available to each CPU is so fast.

> How much data is being handled to complete these joins?

See above, basically all of it because there are no indexes.

> How much data is being exchanged between these machines to complete the joins?

Good question, you can infer it from reading the EXPLAIN plans, but the
stats are not what they appear - generally multiply them by 8.

> What is the connectivity between these 4 machines?

See above.

> Putting these numbers in context may help the advocacy effort
> considerably as well as help us improve things even further. ;-)

Thanks! (bait bait) I can safely say that there have never been query
speeds with Postgres this fast, not even close. (bait bait)

What's next is a machine 4 or maybe 8 times this fast.

Explain for second query is attached, compressed with gzip (it¹s kinda
huge).

- Luke

Attachment Content-Type Size
q2-explain.txt.gz application/octet-stream 1.1 KB

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Alvaro Herrera 2006-02-25 16:03:00 Re: Schema vs Independant Databases, ACLS,Overhead,pg_hba.conf
Previous Message Ron 2006-02-25 11:56:33 Re: Reliability recommendations