Re: Low perfomance SUM and Group by large databse

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: "Sergio Charpinel Jr(dot)" <sergiocharpinel(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Low perfomance SUM and Group by large databse
Date: 2010-06-21 22:06:47
Message-ID: 4C1FE277.5030603@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 21/06/10 22:42, Sergio Charpinel Jr. wrote:
> Hi,
>
> I'm getting low performance on SUM and GROUP BY queries.
> How can I improve my database to perform such queries.

> -> Sort (cost=3499863.27..3523695.89 rows=9533049
> width=50) (actual time=165468.022..168908.828 rows=9494165 loops=1)
> Sort Key: ip_src, port_src, ip_dst, port_dst,
> tcp_flags, ip_proto
> -> Seq Scan on acct_2010_25 (cost=0.00..352648.10
> rows=9533049 width=50) (actual time=0.038..50860.391 rows=9494165 loops=1)
> Filter: ((stamp_inserted >= '2010-06-20
> 10:10:00'::timestamp without time zone) AND (stamp_inserted <
> '2010-06-21 10:10:00'::timestamp without time zone))

Provide an index on at least (ip_src,port_src,ip_dst,port_dst). If you
frequently do other queries that only want some of that information you
could create several individual indexes for those columns instead, as Pg
will combine them for a query, but that is much less efficient than an
index across all four columns.

CREATE INDEX ip_peers_idx ON acct_2010_25(ip_src,port_src,ip_dst_port_dst);

Every index added costs you insert/update/delete speed, so try to find
the smallest/simplest index that gives you acceptable performance.

> Another one just summing bytes (still low):
>
> => explain analyze SELECT ip_src, port_src, ip_dst, port_dst, tcp_flags,
> ip_proto,SUM("bytes") FROM "acct_2010_25" WHERE
> "stamp_inserted">='2010-06-20 10:10' AND "stamp_inserted"<'2010-06-21
> 10:10' GROUP BY ip_src, port_src, ip_dst, port_dst, tcp_flags, ip_proto
> LIMIT 50 OFFSET 0;

Same deal. You have no suitable index, so Pg has to do a sequential scan
of the table. Since you appear to query on stamp_inserted a lot, you
should index it.

--
Craig Ringer

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Haas 2010-06-21 22:14:39 Re: PostgreSQL as a local in-memory cache
Previous Message Scott Marlowe 2010-06-21 18:08:05 Re: mysql to postgresql, performance questions