Time intersect query

From: Brad Murray <brad(at)lightcastmedia(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Time intersect query
Date: 2009-03-23 04:44:58
Message-ID: 1e282ad30903222144i480939et44a255c11753e83f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I'm wanting to optimize and improve a query to get the maximum number of
users over a period of time. What I'm trying to accomplish is to get
graphable data points of the maximum number of simultaneous users at a
specified interval over a period of time, preferably with only a single pass
through the data.

----------
streamlog table (I've only included relevant fields and indexes):
id bigint
ts timestamp
viewtime integer
client integer
-- primary key on id field
-- streamlog_tsrange_index btree (client, ts, startts(ts, viewtime))

\df+ startts

List of functions
Schema | Name | Result data type | Argument data
types | Volatility | Owner | Language | Sou
rce code |
Description

--------+---------+-----------------------------+--------------------------------------+------------+-------+----------+-----------------------
-----------------------------+-------------
public | startts | timestamp without time zone | timestamp without time
zone, integer | immutable | root | sql | select $1 - ($2::varch
ar || ' seconds')::interval; |

The ts timestamp is the time which the data was logged.
The viewtime is the amount of time the user was online in seconds
the startts function determines when the session started by subtracting
viewtime from ts and returning a timestamp

-----------------

My current procedure...
1) Create temporary table with each possible data point. This example uses
recursive functions from pgsql 8.4 but was originally implemented by using
large numbers of queries from php. My knowledge of the recursive functions
is pretty week, but I was able to put this together without too much
trouble.

create temp table timerange as with recursive f as (
select '2009-03-21 18:20:00'::timestamp as a
union all
select a+'30 seconds'::interval as a from f where a < '2009-03-21
20:20:00'::timestamp
) select a from f;

2) Update table with record counts
alter table timerange add column mycount integer;
explain analyze update timerange set mycount = (select count(*) from
streamlogfoo where client = 3041 and a between startts(ts,viewtime) and ts);

-----------------

This seems to work reasonably well, with the following exceptions...

1) The number reported is the number at the set time period, not the highest
value between each data point. With a 30 second interval, this isn't a big
problem, but with larger intervals gives results that do not match what I'm
looking for (maximum users).
2) This does not scale well for large numbers of points, as internally each
data point is a complete scan through the data, even though most of the data
points will be common for the entire range.

I'm thinking this would be a good use for the new window functions, but I'm
not sure where to begin. Any ideas?

-----------------

Responses

Browse pgsql-general by date

  From Date Subject
Next Message DM 2009-03-23 05:44:28 pg_restore error - Any Idea?
Previous Message Harvey, Allan AC 2009-03-23 04:18:13 Re: bash & postgres