From: | Don Bowman <don(at)sandvine(dot)com> |
---|---|
To: | "'pgsql-performance(at)postgresql(dot)org'" <pgsql-performance(at)postgresql(dot)org> |
Cc: | Don Bowman <don(at)sandvine(dot)com> |
Subject: | conceptual method to create high performance query involving time |
Date: | 2004-02-21 21:12:24 |
Message-ID: | FE045D4D9F7AED4CBFF1B3B813C85337045D81D5@mail.sandvine.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
I have a table which has e.g.
CREATE TABLE portstats
(
id serial,
logtime TIMESTAMP,
cluster VARCHAR(40),
element VARCHAR(40),
port INT,
rxOctets BIGINT,
txOctets BIGINT
);
which is used for logging statistics from network equipment.
cluster is like the location.
rxOctets, txOctets are numbers which increase over time.
Now, i would like to generate a chart which shows the
bitrate. So i need subtract rxOctets from a previous
value and divide by the time range.
To be efficient, and avoid fetching too many points, i
want the interval between points I select to be a function
of the time range. E.g., when I'm doing a 1-day chart,
i would like to select points that are 15min apart.
When I'm doing a 1-yr query, I would like to select
points that are e.g. 4hours apart. I can make this
determination in the script that generates the statement.
The problem i'm having is that this is
a) a very slow operation
b) selects all data points on t1, and then the interval
apart one on t2... so i still end up with too many
points.
points are logged every ~5 minutes, but there is
some small variation on the interval (and some observations
might be missing due to eg communication loss to db).
[ a process goes along later and decimates out points
as they age to prevent the db from becoming very large].
The query I have is below. The question is ... what
is the best strategy for an operation of this nature?
SELECT
t1.port,
t1.logtime AS start,
t2.logtime AS end,
t1.cluster,
t1.element,
(8.0 * (t2.rxoctets - t1.rxoctets) /
(extract(EPOCH FROM(t2.logtime - t1.logtime))))::int8 AS rxbps,
(8.0 * (t2.txoctets - t1.txoctets) /
(extract(EPOCH FROM(t2.logtime - t1.logtime))))::int8 AS txbps
FROM
portstats t1
INNER JOIN portstats t2
ON t2.cluster = t1.cluster
AND t2.element = t1.element
AND t2.port = t1.port
AND t2.logtime =
(SELECT logtime
FROM portstats t3
WHERE t3.cluster = t1.cluster
AND t3.element = t1.element
AND t3.port = t1.port
AND t3.logtime > t1.logtime + '00:15:00'
ORDER BY cluster ASC,
element ASC,
port ASC,
logtime ASC
LIMIT 1)
WHERE t1.cluster = 'somecluster'
AND (t1.element = 'somelement')
AND (t1.logtime BETWEEN '2004-01-07 00:00' AND '2004-02-08 00:00')
ORDER BY
t1.cluster ASC,
t1.element ASC,
t1.port ASC,
t1.logtime ASC
;
The query plan for 1 week is below, this takes ~2s to operate. It gets very
slow for 1yr.
Sort (cost=14055.35..14067.74 rows=4956 width=176) (actual
time=1523.956..1538.354 rows=5943 loops=1)
Sort Key: t1.svcluster, t1.element, t1.port, t1.logtime
-> Merge Join (cost=2304.49..13751.18 rows=4956 width=176) (actual
time=1008.620..1329.766 rows=5943 loops=1)
Merge Cond: (("outer"."?column10?" = "inner".logtime) AND
("outer".port = "inner".port))
-> Sort (cost=977.39..992.25 rows=5944 width=136) (actual
time=678.564..692.974 rows=5943 loops=1)
Sort Key: (subplan), t1.port
-> Index Scan using portstats_element_idx on portstats t1
(cost=0.00..604.78 rows=5944 width=136) (actual time=0.191..581.311 ro
ws=5943 loops=1)
Index Cond: (element = 'my-element.mydomain.net'::bpchar)
Filter: ((svcluster = 'my-cluster'::bpchar) AND (logtime
>= '2004-01-07 00:00:00-05'::timestamp with time zone) AND (logtime
<= '2004-02-08 00:00:00-05'::timestamp with time zone))
SubPlan
-> Limit (cost=0.00..0.62 rows=1 width=104) (actual
time=0.064..0.066 rows=1 loops=5943)
-> Index Scan using www6 on portstats t3
(cost=0.00..399.28 rows=643 width=104) (actual time=0.054..0.054 rows=1 l
oops=5943)
Index Cond: ((svcluster = $1) AND (element
= $2) AND (port = $3) AND (logtime > ($4 + '00:15:00'::interval)))
-> Sort (cost=1327.10..1356.00 rows=11560 width=136) (actual
time=289.168..321.522 rows=11771 loops=1)
Sort Key: t2.logtime, t2.port
-> Index Scan using portstats_element_idx on portstats t2
(cost=0.00..546.98 rows=11560 width=136) (actual time=0.103..192.027 r
ows=11560 loops=1)
Index Cond: ('my-element.mydomain.net'::bpchar = element)
Filter: (('my-cluster'::bpchar = svcluster))
Total runtime: 1609.411 ms
(19 rows)
From | Date | Subject | |
---|---|---|---|
Next Message | John Siracusa | 2004-02-22 00:18:04 | Column correlation drifts, index ignored again |
Previous Message | Sean Shanny | 2004-02-20 21:57:42 | Re: General performance questions about postgres on Apple |