From: | "Sergio Charpinel Jr(dot)" <sergiocharpinel(at)gmail(dot)com> |
---|---|
To: | Russell Smith <mr-russ(at)pws(dot)com(dot)au> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Low perfomance SUM and Group by large databse |
Date: | 2010-06-23 11:40:39 |
Message-ID: | AANLkTinvVEwYZKo_NhiPKqRKrvU2heY8ypCYJ4vaJmEt@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Craig, Russel,
I appreciate your help.
Thanks.
2010/6/22 Russell Smith <mr-russ(at)pws(dot)com(dot)au>
> On 22/06/10 00:42, Sergio Charpinel Jr. wrote:
> > Hi,
> >
> [snip]
> >
> > => explain analyze SELECT ip_src, port_src, ip_dst, port_dst,
> > tcp_flags, ip_proto,SUM("bytes"),SUM("packets"),SUM("flows") FROM
> > "acct_2010_25" WHERE "stamp_inserted">='2010-06-20 10:10' AND
> > "stamp_inserted"<'2010-06-21 10:10' GROUP BY ip_src, port_src, ip_dst,
> > port_dst, tcp_flags, ip_proto order by SUM(bytes) desc LIMIT 50 OFFSET 0;
> >
> > QUERY PLAN
> >
> >
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > Limit (cost=3998662.81..3998662.94 rows=50 width=50) (actual
> > time=276981.107..276981.133 rows=50 loops=1)
> > -> Sort (cost=3998662.81..4001046.07 rows=953305 width=50)
> > (actual time=276981.105..276981.107 rows=50 loops=1)
> > Sort Key: sum(bytes)
> > -> GroupAggregate (cost=3499863.27..3754872.33 rows=953305
> > width=50) (actual time=165468.257..182677.580 rows=8182616 loops=1)
> > -> Sort (cost=3499863.27..3523695.89 rows=9533049
> > width=50) (actual time=165468.022..168908.828 rows=9494165 loops=1)
> > Sort Key: ip_src, port_src, ip_dst, port_dst,
> > tcp_flags, ip_proto
>
> You are having to sort and aggregate a large number of rows before you
> can get the top 50. That's 9 million rows in this case, width 50 =
> 400MB+ sort. That's going to be slow as you are going to have to sort
> it on disk unless you bump up sort mem to 500Mb (bad idea). So unless
> you have really fast storage for temporary tables it's going to take a
> while. About 2.5 minutes you are experiencing at the moment is probably
> not too bad.
>
> I'm sure improvements have been made in the area since 8.1 and if you
> are able to upgrade to 8.4 which is also offered by Centos5 now, you
> might get benefit there. I can't remember the specific benefits, but I
> believe sorting speed has improved, your explain analyze will also give
> you more information about what's going on with disk/memory sorting.
>
> > -> Seq Scan on acct_2010_25
> > (cost=0.00..352648.10 rows=9533049 width=50) (actual
> > time=0.038..50860.391 rows=9494165 loops=1)
> > Filter: ((stamp_inserted >= '2010-06-20
> > 10:10:00'::timestamp without time zone) AND (stamp_inserted <
> > '2010-06-21 10:10:00'::timestamp without time zone))
> > Total runtime: 278791.661 ms
> > (9 registros)
> >
> > Another one just summing bytes (still low):
> >
> > => explain analyze SELECT ip_src, port_src, ip_dst, port_dst,
> > tcp_flags, ip_proto,SUM("bytes") FROM "acct_2010_25" WHERE
> > "stamp_inserted">='2010-06-20 10:10' AND "stamp_inserted"<'2010-06-21
> > 10:10' GROUP BY ip_src, port_src, ip_dst, port_dst, tcp_flags,
> > ip_proto LIMIT 50 OFFSET 0;
> >
> > QUERY PLAN
> >
> >
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > Limit (cost=3395202.50..3395213.12 rows=50 width=42) (actual
> > time=106261.359..106261.451 rows=50 loops=1)
> > -> GroupAggregate (cost=3395202.50..3602225.48 rows=974226
> > width=42) (actual time=106261.357..106261.435 rows=50 loops=1)
> > -> Sort (cost=3395202.50..3419558.14 rows=9742258 width=42)
> > (actual time=106261.107..106261.169 rows=176 loops=1)
> > Sort Key: ip_src, port_src, ip_dst, port_dst,
> > tcp_flags, ip_proto
> > -> Seq Scan on acct_2010_25 (cost=0.00..367529.72
> > rows=9742258 width=42) (actual time=0.073..8058.598 rows=9494165 loops=1)
> > Filter: ((stamp_inserted >= '2010-06-20
> > 10:10:00'::timestamp without time zone) AND (stamp_inserted <
> > '2010-06-21 10:10:00'::timestamp without time zone))
> > Total runtime: 109911.882 ms
> > (7 registros)
> >
> >
> > The server has 2 Intel(R) Xeon(R) CPU E5430 @ 2.66GHz and 16GB RAM.
> > I'm using PostgreSQL 8.1.18 default config from Centos 5.5 (just
> > increased checkpoint_segments to 50).
>
> Checkpoint segments won't help you as the number of segments is about
> writing to the database and how fast that can happen.
>
> >
> > What can I change to increase performance?
>
> Increasing sort-memory (work_mem) will give you speed benefits even
> though you are going to disk. I don't know how much spare memory you
> have, but trying other values between 8MB and 128MB may be useful just
> for the specific query runs. If you can afford 512Mb for each of the
> two sorts, go for that, but it's dangerous as mentioned due to the risk
> of using more RAM than you have. work_mem allocates that amount of
> memory per sort.
>
> If you are running these queries all the time, a summary table the
> produces there reports on a regular basis, maybe daily or even hourly
> would be useful. Basically the large amount of information that needs
> to be processed and sorted is what's taking all the time here.
>
> Regards
>
> Russell
>
--
Sergio Roberto Charpinel Jr.
From | Date | Subject | |
---|---|---|---|
Next Message | Florian Weimer | 2010-06-23 12:00:21 | Re: raid10 write performance |
Previous Message | Ivan Voras | 2010-06-23 10:06:56 | Re: raid10 write performance |