Low perfomance SUM and Group by large databse

From: "Sergio Charpinel Jr(dot)" <sergiocharpinel(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Low perfomance SUM and Group by large databse
Date: 2010-06-21 14:42:14
Message-ID: AANLkTin7cGA3b3P6gPvptI9OaD8jf9O5oFWBfKcUocTT@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

I'm getting low performance on SUM and GROUP BY queries.
How can I improve my database to perform such queries.

Here is my table schema:
=> \d acct_2010_25
Tabela "public.acct_2010_25"
Coluna | Tipo |
Modificadores
----------------+-----------------------------+------------------------------------------------------------------------
ip_src | inet | not null default
'0.0.0.0'::inet
ip_dst | inet | not null default
'0.0.0.0'::inet
as_src | bigint | not null default 0
as_dst | bigint | not null default 0
port_src | integer | not null default 0
port_dst | integer | not null default 0
tcp_flags | smallint | not null default 0
ip_proto | smallint | not null default 0
packets | integer | not null
flows | integer | not null default 0
bytes | bigint | not null
stamp_inserted | timestamp without time zone | not null default '0001-01-01
00:00:00 BC'::timestamp without time zone
stamp_updated | timestamp without time zone |
Índices:
"acct_2010_25_pk" PRIMARY KEY, btree (stamp_inserted, ip_src, ip_dst,
port_src, port_dst, ip_proto)
"ibytes_acct_2010_25" btree (bytes)

Here is my one query example (could add pk to flow and packet fields):

=> explain analyze SELECT ip_src, port_src, ip_dst, port_dst, tcp_flags,
ip_proto,SUM("bytes"),SUM("packets"),SUM("flows") FROM "acct_2010_25" WHERE
"stamp_inserted">='2010-06-20 10:10' AND "stamp_inserted"<'2010-06-21 10:10'
GROUP BY ip_src, port_src, ip_dst, port_dst, tcp_flags, ip_proto order by
SUM(bytes) desc LIMIT 50 OFFSET 0;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=3998662.81..3998662.94 rows=50 width=50) (actual
time=276981.107..276981.133 rows=50 loops=1)
-> Sort (cost=3998662.81..4001046.07 rows=953305 width=50) (actual
time=276981.105..276981.107 rows=50 loops=1)
Sort Key: sum(bytes)
-> GroupAggregate (cost=3499863.27..3754872.33 rows=953305
width=50) (actual time=165468.257..182677.580 rows=8182616 loops=1)
-> Sort (cost=3499863.27..3523695.89 rows=9533049 width=50)
(actual time=165468.022..168908.828 rows=9494165 loops=1)
Sort Key: ip_src, port_src, ip_dst, port_dst,
tcp_flags, ip_proto
-> Seq Scan on acct_2010_25 (cost=0.00..352648.10
rows=9533049 width=50) (actual time=0.038..50860.391 rows=9494165 loops=1)
Filter: ((stamp_inserted >= '2010-06-20
10:10:00'::timestamp without time zone) AND (stamp_inserted < '2010-06-21
10:10:00'::timestamp without time zone))
Total runtime: 278791.661 ms
(9 registros)

Another one just summing bytes (still low):

=> explain analyze SELECT ip_src, port_src, ip_dst, port_dst, tcp_flags,
ip_proto,SUM("bytes") FROM "acct_2010_25" WHERE
"stamp_inserted">='2010-06-20 10:10' AND "stamp_inserted"<'2010-06-21 10:10'
GROUP BY ip_src, port_src, ip_dst, port_dst, tcp_flags, ip_proto LIMIT 50
OFFSET 0;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=3395202.50..3395213.12 rows=50 width=42) (actual
time=106261.359..106261.451 rows=50 loops=1)
-> GroupAggregate (cost=3395202.50..3602225.48 rows=974226 width=42)
(actual time=106261.357..106261.435 rows=50 loops=1)
-> Sort (cost=3395202.50..3419558.14 rows=9742258 width=42)
(actual time=106261.107..106261.169 rows=176 loops=1)
Sort Key: ip_src, port_src, ip_dst, port_dst, tcp_flags,
ip_proto
-> Seq Scan on acct_2010_25 (cost=0.00..367529.72
rows=9742258 width=42) (actual time=0.073..8058.598 rows=9494165 loops=1)
Filter: ((stamp_inserted >= '2010-06-20
10:10:00'::timestamp without time zone) AND (stamp_inserted < '2010-06-21
10:10:00'::timestamp without time zone))
Total runtime: 109911.882 ms
(7 registros)

The server has 2 Intel(R) Xeon(R) CPU E5430 @ 2.66GHz and 16GB RAM.
I'm using PostgreSQL 8.1.18 default config from Centos 5.5 (just
increased checkpoint_segments to 50).

What can I change to increase performance?

Thanks in advance.

Cheers.

--
Sergio Roberto Charpinel Jr.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Alvaro Herrera 2010-06-21 16:36:54 Re: Aggressive autovacuuming ?
Previous Message Kevin Grittner 2010-06-21 13:53:44 Re: HashAggregate slower than sort?