Skip site navigation (1) Skip section navigation (2)

Low perfomance SUM and Group by large databse

From: "Sergio Charpinel Jr(dot)" <sergiocharpinel(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Low perfomance SUM and Group by large databse
Date: 2010-06-21 14:42:14
Message-ID: AANLkTin7cGA3b3P6gPvptI9OaD8jf9O5oFWBfKcUocTT@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-performance
Hi,

I'm getting low performance on SUM and GROUP BY queries.
How can I improve my database to perform such queries.

Here is my table schema:
=> \d acct_2010_25
                                             Tabela "public.acct_2010_25"
     Coluna     |            Tipo             |
Modificadores
----------------+-----------------------------+------------------------------------------------------------------------
 ip_src         | inet                        | not null default
'0.0.0.0'::inet
 ip_dst         | inet                        | not null default
'0.0.0.0'::inet
 as_src         | bigint                      | not null default 0
 as_dst         | bigint                      | not null default 0
 port_src       | integer                     | not null default 0
 port_dst       | integer                     | not null default 0
 tcp_flags      | smallint                    | not null default 0
 ip_proto       | smallint                    | not null default 0
 packets        | integer                     | not null
 flows          | integer                     | not null default 0
 bytes          | bigint                      | not null
 stamp_inserted | timestamp without time zone | not null default '0001-01-01
00:00:00 BC'::timestamp without time zone
 stamp_updated  | timestamp without time zone |
├Źndices:
    "acct_2010_25_pk" PRIMARY KEY, btree (stamp_inserted, ip_src, ip_dst,
port_src, port_dst, ip_proto)
    "ibytes_acct_2010_25" btree (bytes)

Here is my one query example (could add pk to flow and packet fields):

=> explain analyze SELECT ip_src, port_src, ip_dst, port_dst, tcp_flags,
ip_proto,SUM("bytes"),SUM("packets"),SUM("flows") FROM "acct_2010_25" WHERE
"stamp_inserted">='2010-06-20 10:10' AND "stamp_inserted"<'2010-06-21 10:10'
GROUP BY ip_src, port_src, ip_dst, port_dst, tcp_flags, ip_proto order by
SUM(bytes) desc LIMIT 50 OFFSET 0;

         QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3998662.81..3998662.94 rows=50 width=50) (actual
time=276981.107..276981.133 rows=50 loops=1)
   ->  Sort  (cost=3998662.81..4001046.07 rows=953305 width=50) (actual
time=276981.105..276981.107 rows=50 loops=1)
         Sort Key: sum(bytes)
         ->  GroupAggregate  (cost=3499863.27..3754872.33 rows=953305
width=50) (actual time=165468.257..182677.580 rows=8182616 loops=1)
               ->  Sort  (cost=3499863.27..3523695.89 rows=9533049 width=50)
(actual time=165468.022..168908.828 rows=9494165 loops=1)
                     Sort Key: ip_src, port_src, ip_dst, port_dst,
tcp_flags, ip_proto
                     ->  Seq Scan on acct_2010_25  (cost=0.00..352648.10
rows=9533049 width=50) (actual time=0.038..50860.391 rows=9494165 loops=1)
                           Filter: ((stamp_inserted >= '2010-06-20
10:10:00'::timestamp without time zone) AND (stamp_inserted < '2010-06-21
10:10:00'::timestamp without time zone))
 Total runtime: 278791.661 ms
(9 registros)

Another one just summing bytes (still low):

=> explain analyze SELECT ip_src, port_src, ip_dst, port_dst, tcp_flags,
ip_proto,SUM("bytes") FROM "acct_2010_25" WHERE
"stamp_inserted">='2010-06-20 10:10' AND "stamp_inserted"<'2010-06-21 10:10'
GROUP BY ip_src, port_src, ip_dst, port_dst, tcp_flags, ip_proto LIMIT 50
OFFSET 0;

      QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3395202.50..3395213.12 rows=50 width=42) (actual
time=106261.359..106261.451 rows=50 loops=1)
   ->  GroupAggregate  (cost=3395202.50..3602225.48 rows=974226 width=42)
(actual time=106261.357..106261.435 rows=50 loops=1)
         ->  Sort  (cost=3395202.50..3419558.14 rows=9742258 width=42)
(actual time=106261.107..106261.169 rows=176 loops=1)
               Sort Key: ip_src, port_src, ip_dst, port_dst, tcp_flags,
ip_proto
               ->  Seq Scan on acct_2010_25  (cost=0.00..367529.72
rows=9742258 width=42) (actual time=0.073..8058.598 rows=9494165 loops=1)
                     Filter: ((stamp_inserted >= '2010-06-20
10:10:00'::timestamp without time zone) AND (stamp_inserted < '2010-06-21
10:10:00'::timestamp without time zone))
 Total runtime: 109911.882 ms
(7 registros)


The server has 2 Intel(R) Xeon(R) CPU  E5430 @ 2.66GHz and 16GB RAM.
I'm using PostgreSQL 8.1.18 default config from Centos 5.5 (just
increased checkpoint_segments to 50).

What can I change to increase performance?

Thanks in advance.

Cheers.

-- 
Sergio Roberto Charpinel Jr.

Responses

pgsql-performance by date

Next:From: Alvaro HerreraDate: 2010-06-21 16:36:54
Subject: Re: Aggressive autovacuuming ?
Previous:From: Kevin GrittnerDate: 2010-06-21 13:53:44
Subject: Re: HashAggregate slower than sort?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group