Re: Low perfomance SUM and Group by large databse

From: "Sergio Charpinel Jr(dot)" <sergiocharpinel(at)gmail(dot)com>
To: Russell Smith <mr-russ(at)pws(dot)com(dot)au>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Low perfomance SUM and Group by large databse
Date: 2010-06-23 11:40:39
Message-ID: AANLkTinvVEwYZKo_NhiPKqRKrvU2heY8ypCYJ4vaJmEt@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Craig, Russel,

I appreciate your help.

Thanks.

2010/6/22 Russell Smith <mr-russ(at)pws(dot)com(dot)au>

> On 22/06/10 00:42, Sergio Charpinel Jr. wrote:
> > Hi,
> >
> [snip]
> >
> > => explain analyze SELECT ip_src, port_src, ip_dst, port_dst,
> > tcp_flags, ip_proto,SUM("bytes"),SUM("packets"),SUM("flows") FROM
> > "acct_2010_25" WHERE "stamp_inserted">='2010-06-20 10:10' AND
> > "stamp_inserted"<'2010-06-21 10:10' GROUP BY ip_src, port_src, ip_dst,
> > port_dst, tcp_flags, ip_proto order by SUM(bytes) desc LIMIT 50 OFFSET 0;
> >
> > QUERY PLAN
> >
> >
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > Limit (cost=3998662.81..3998662.94 rows=50 width=50) (actual
> > time=276981.107..276981.133 rows=50 loops=1)
> > -> Sort (cost=3998662.81..4001046.07 rows=953305 width=50)
> > (actual time=276981.105..276981.107 rows=50 loops=1)
> > Sort Key: sum(bytes)
> > -> GroupAggregate (cost=3499863.27..3754872.33 rows=953305
> > width=50) (actual time=165468.257..182677.580 rows=8182616 loops=1)
> > -> Sort (cost=3499863.27..3523695.89 rows=9533049
> > width=50) (actual time=165468.022..168908.828 rows=9494165 loops=1)
> > Sort Key: ip_src, port_src, ip_dst, port_dst,
> > tcp_flags, ip_proto
>
> You are having to sort and aggregate a large number of rows before you
> can get the top 50. That's 9 million rows in this case, width 50 =
> 400MB+ sort. That's going to be slow as you are going to have to sort
> it on disk unless you bump up sort mem to 500Mb (bad idea). So unless
> you have really fast storage for temporary tables it's going to take a
> while. About 2.5 minutes you are experiencing at the moment is probably
> not too bad.
>
> I'm sure improvements have been made in the area since 8.1 and if you
> are able to upgrade to 8.4 which is also offered by Centos5 now, you
> might get benefit there. I can't remember the specific benefits, but I
> believe sorting speed has improved, your explain analyze will also give
> you more information about what's going on with disk/memory sorting.
>
> > -> Seq Scan on acct_2010_25
> > (cost=0.00..352648.10 rows=9533049 width=50) (actual
> > time=0.038..50860.391 rows=9494165 loops=1)
> > Filter: ((stamp_inserted >= '2010-06-20
> > 10:10:00'::timestamp without time zone) AND (stamp_inserted <
> > '2010-06-21 10:10:00'::timestamp without time zone))
> > Total runtime: 278791.661 ms
> > (9 registros)
> >
> > Another one just summing bytes (still low):
> >
> > => explain analyze SELECT ip_src, port_src, ip_dst, port_dst,
> > tcp_flags, ip_proto,SUM("bytes") FROM "acct_2010_25" WHERE
> > "stamp_inserted">='2010-06-20 10:10' AND "stamp_inserted"<'2010-06-21
> > 10:10' GROUP BY ip_src, port_src, ip_dst, port_dst, tcp_flags,
> > ip_proto LIMIT 50 OFFSET 0;
> >
> > QUERY PLAN
> >
> >
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > Limit (cost=3395202.50..3395213.12 rows=50 width=42) (actual
> > time=106261.359..106261.451 rows=50 loops=1)
> > -> GroupAggregate (cost=3395202.50..3602225.48 rows=974226
> > width=42) (actual time=106261.357..106261.435 rows=50 loops=1)
> > -> Sort (cost=3395202.50..3419558.14 rows=9742258 width=42)
> > (actual time=106261.107..106261.169 rows=176 loops=1)
> > Sort Key: ip_src, port_src, ip_dst, port_dst,
> > tcp_flags, ip_proto
> > -> Seq Scan on acct_2010_25 (cost=0.00..367529.72
> > rows=9742258 width=42) (actual time=0.073..8058.598 rows=9494165 loops=1)
> > Filter: ((stamp_inserted >= '2010-06-20
> > 10:10:00'::timestamp without time zone) AND (stamp_inserted <
> > '2010-06-21 10:10:00'::timestamp without time zone))
> > Total runtime: 109911.882 ms
> > (7 registros)
> >
> >
> > The server has 2 Intel(R) Xeon(R) CPU E5430 @ 2.66GHz and 16GB RAM.
> > I'm using PostgreSQL 8.1.18 default config from Centos 5.5 (just
> > increased checkpoint_segments to 50).
>
> Checkpoint segments won't help you as the number of segments is about
> writing to the database and how fast that can happen.
>
> >
> > What can I change to increase performance?
>
> Increasing sort-memory (work_mem) will give you speed benefits even
> though you are going to disk. I don't know how much spare memory you
> have, but trying other values between 8MB and 128MB may be useful just
> for the specific query runs. If you can afford 512Mb for each of the
> two sorts, go for that, but it's dangerous as mentioned due to the risk
> of using more RAM than you have. work_mem allocates that amount of
> memory per sort.
>
> If you are running these queries all the time, a summary table the
> produces there reports on a regular basis, maybe daily or even hourly
> would be useful. Basically the large amount of information that needs
> to be processed and sorted is what's taking all the time here.
>
> Regards
>
> Russell
>

--
Sergio Roberto Charpinel Jr.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Florian Weimer 2010-06-23 12:00:21 Re: raid10 write performance
Previous Message Ivan Voras 2010-06-23 10:06:56 Re: raid10 write performance