conceptual method to create high performance query involving time

From: Don Bowman <don(at)sandvine(dot)com>
To: "'pgsql-performance(at)postgresql(dot)org'" <pgsql-performance(at)postgresql(dot)org>
Cc: Don Bowman <don(at)sandvine(dot)com>
Subject: conceptual method to create high performance query involving time
Date: 2004-02-21 21:12:24
Message-ID: FE045D4D9F7AED4CBFF1B3B813C85337045D81D5@mail.sandvine.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


I have a table which has e.g.

CREATE TABLE portstats
(
id serial,
logtime TIMESTAMP,
cluster VARCHAR(40),
element VARCHAR(40),
port INT,
rxOctets BIGINT,
txOctets BIGINT
);

which is used for logging statistics from network equipment.
cluster is like the location.
rxOctets, txOctets are numbers which increase over time.

Now, i would like to generate a chart which shows the
bitrate. So i need subtract rxOctets from a previous
value and divide by the time range.

To be efficient, and avoid fetching too many points, i
want the interval between points I select to be a function
of the time range. E.g., when I'm doing a 1-day chart,
i would like to select points that are 15min apart.
When I'm doing a 1-yr query, I would like to select
points that are e.g. 4hours apart. I can make this
determination in the script that generates the statement.

The problem i'm having is that this is
a) a very slow operation
b) selects all data points on t1, and then the interval
apart one on t2... so i still end up with too many
points.

points are logged every ~5 minutes, but there is
some small variation on the interval (and some observations
might be missing due to eg communication loss to db).
[ a process goes along later and decimates out points
as they age to prevent the db from becoming very large].

The query I have is below. The question is ... what
is the best strategy for an operation of this nature?

SELECT
t1.port,
t1.logtime AS start,
t2.logtime AS end,
t1.cluster,
t1.element,
(8.0 * (t2.rxoctets - t1.rxoctets) /
(extract(EPOCH FROM(t2.logtime - t1.logtime))))::int8 AS rxbps,
(8.0 * (t2.txoctets - t1.txoctets) /
(extract(EPOCH FROM(t2.logtime - t1.logtime))))::int8 AS txbps
FROM
portstats t1
INNER JOIN portstats t2
ON t2.cluster = t1.cluster
AND t2.element = t1.element
AND t2.port = t1.port
AND t2.logtime =
(SELECT logtime
FROM portstats t3
WHERE t3.cluster = t1.cluster
AND t3.element = t1.element
AND t3.port = t1.port
AND t3.logtime > t1.logtime + '00:15:00'
ORDER BY cluster ASC,
element ASC,
port ASC,
logtime ASC
LIMIT 1)
WHERE t1.cluster = 'somecluster'
AND (t1.element = 'somelement')
AND (t1.logtime BETWEEN '2004-01-07 00:00' AND '2004-02-08 00:00')
ORDER BY
t1.cluster ASC,
t1.element ASC,
t1.port ASC,
t1.logtime ASC
;

The query plan for 1 week is below, this takes ~2s to operate. It gets very
slow for 1yr.

Sort (cost=14055.35..14067.74 rows=4956 width=176) (actual
time=1523.956..1538.354 rows=5943 loops=1)
Sort Key: t1.svcluster, t1.element, t1.port, t1.logtime
-> Merge Join (cost=2304.49..13751.18 rows=4956 width=176) (actual
time=1008.620..1329.766 rows=5943 loops=1)
Merge Cond: (("outer"."?column10?" = "inner".logtime) AND
("outer".port = "inner".port))
-> Sort (cost=977.39..992.25 rows=5944 width=136) (actual
time=678.564..692.974 rows=5943 loops=1)
Sort Key: (subplan), t1.port
-> Index Scan using portstats_element_idx on portstats t1
(cost=0.00..604.78 rows=5944 width=136) (actual time=0.191..581.311 ro
ws=5943 loops=1)
Index Cond: (element = 'my-element.mydomain.net'::bpchar)
Filter: ((svcluster = 'my-cluster'::bpchar) AND (logtime
>= '2004-01-07 00:00:00-05'::timestamp with time zone) AND (logtime
<= '2004-02-08 00:00:00-05'::timestamp with time zone))
SubPlan
-> Limit (cost=0.00..0.62 rows=1 width=104) (actual
time=0.064..0.066 rows=1 loops=5943)
-> Index Scan using www6 on portstats t3
(cost=0.00..399.28 rows=643 width=104) (actual time=0.054..0.054 rows=1 l
oops=5943)
Index Cond: ((svcluster = $1) AND (element
= $2) AND (port = $3) AND (logtime > ($4 + '00:15:00'::interval)))
-> Sort (cost=1327.10..1356.00 rows=11560 width=136) (actual
time=289.168..321.522 rows=11771 loops=1)
Sort Key: t2.logtime, t2.port
-> Index Scan using portstats_element_idx on portstats t2
(cost=0.00..546.98 rows=11560 width=136) (actual time=0.103..192.027 r
ows=11560 loops=1)
Index Cond: ('my-element.mydomain.net'::bpchar = element)
Filter: (('my-cluster'::bpchar = svcluster))
Total runtime: 1609.411 ms
(19 rows)

Browse pgsql-performance by date

  From Date Subject
Next Message John Siracusa 2004-02-22 00:18:04 Column correlation drifts, index ignored again
Previous Message Sean Shanny 2004-02-20 21:57:42 Re: General performance questions about postgres on Apple