*very* slow query to summarize data for a month ...

From: "Marc G(dot) Fournier" <scrappy(at)postgresql(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: *very* slow query to summarize data for a month ...
Date: 2003-11-10 20:18:52
Message-ID: 20031110160144.C727@ganymede.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


Table structure is simple:

CREATE TABLE traffic_logs (
company_id bigint,
ip_id bigint,
port integer,
bytes bigint,
runtime timestamp without time zone
);

runtime is 'day of month' ...

I need to summarize the month, per company, with a query as:

explain analyze SELECT ts.company_id, company_name, SUM(ts.bytes) AS total_traffic
FROM company c, traffic_logs ts
WHERE c.company_id = ts.company_id
AND month_trunc(ts.runtime) = '2003-10-01'
GROUP BY company_name,ts.company_id;

and the explain looks like:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=32000.94..32083.07 rows=821 width=41) (actual time=32983.36..47586.17 rows=144 loops=1)
-> Group (cost=32000.94..32062.54 rows=8213 width=41) (actual time=32957.40..42817.88 rows=462198 loops=1)
-> Sort (cost=32000.94..32021.47 rows=8213 width=41) (actual time=32957.38..36261.31 rows=462198 loops=1)
Sort Key: c.company_name, ts.company_id
-> Merge Join (cost=31321.45..31466.92 rows=8213 width=41) (actual time=13983.07..22642.14 rows=462198 loops=1)
Merge Cond: ("outer".company_id = "inner".company_id)
-> Sort (cost=24.41..25.29 rows=352 width=25) (actual time=5.52..7.40 rows=348 loops=1)
Sort Key: c.company_id
-> Seq Scan on company c (cost=0.00..9.52 rows=352 width=25) (actual time=0.02..2.78 rows=352 loops=1)
-> Sort (cost=31297.04..31317.57 rows=8213 width=16) (actual time=13977.49..16794.41 rows=462198 loops=1)
Sort Key: ts.company_id
-> Index Scan using tl_month on traffic_logs ts (cost=0.00..30763.02 rows=8213 width=16) (actual time=0.29..5562.25 rows=462198 loops=1)
Index Cond: (month_trunc(runtime) = '2003-10-01 00:00:00'::timestamp without time zone)
Total runtime: 47587.82 msec
(14 rows)

the problem is that we're only taking a few months worth of data, so I
don't think there is much of a way of 'improve performance' on this, but
figured I'd ask quickly before I do something rash ...

Note that without the month_trunc() index, the Total runtime more then
doubles:

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=39578.63..39660.76 rows=821 width=41) (actual time=87805.47..101251.35 rows=144 loops=1)
-> Group (cost=39578.63..39640.23 rows=8213 width=41) (actual time=87779.56..96824.56 rows=462198 loops=1)
-> Sort (cost=39578.63..39599.17 rows=8213 width=41) (actual time=87779.52..90781.48 rows=462198 loops=1)
Sort Key: c.company_name, ts.company_id
-> Merge Join (cost=38899.14..39044.62 rows=8213 width=41) (actual time=64073.98..72783.68 rows=462198 loops=1)
Merge Cond: ("outer".company_id = "inner".company_id)
-> Sort (cost=24.41..25.29 rows=352 width=25) (actual time=64.66..66.55 rows=348 loops=1)
Sort Key: c.company_id
-> Seq Scan on company c (cost=0.00..9.52 rows=352 width=25) (actual time=1.76..61.70 rows=352 loops=1)
-> Sort (cost=38874.73..38895.27 rows=8213 width=16) (actual time=64009.26..66860.71 rows=462198 loops=1)
Sort Key: ts.company_id
-> Seq Scan on traffic_logs ts (cost=0.00..38340.72 rows=8213 width=16) (actual time=5.02..-645982.04 rows=462198 loops=1)
Filter: (date_trunc('month'::text, runtime) = '2003-10-01 00:00:00'::timestamp without time zone)
Total runtime: 101277.17 msec
(14 rows)

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Patrick Hatcher 2003-11-10 20:31:08 Re: *very* slow query to summarize data for a month ...
Previous Message Rajesh Kumar Mallah 2003-11-10 16:27:20 Re: [PERFORM] error while executing a c program with embedded sql