time series query

From: Jaime Silvela <JSilvela(at)Bear(dot)com>
To:
Cc: pgsql-general(at)postgresql(dot)org
Subject: time series query
Date: 2007-03-30 13:30:25
Message-ID: 460D10F1.8040706@bear.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

The problem I'm trying to solve is pretty standard. I have a table that
records measurements of different types at different times.

CREATE TABLE measurements (
obj_id int4,
val_type_id int4 references lu_val_type(val_type_id),
val numeric,
observation_date date
);

I want a query as simple and fast as possible to return the latest
observation of each type for each object.
I sent a message to this list a while ago, and the suggestion I found to
be the best compromise of clarity and speed was:
a) create an index on (obj_id, val_type_id, observation_date)
b) the "obvious" query becomes fast thanks to the index.
select ms.*
from (
select obj_id, val_type_id, max(observation_date) as
observation_date
from measurements
group by obj_id, val_type_id
) ms_last
join measurements ms using (obj_id, val_type_id, observation_date);

It still bugged me a bit that this requires a JOIN, especially since in
a procedural language, it would have been so easy to return the value
associated with the max(observation_date).
I think I've found a pretty good alternative. This at the moment works
if we keep track of time with an integer, rather than a date, but it
would be readily extensible.

The idea is to in fact, associate the value with the
max(observation_date) like so:
select obj_id, val_type_id, max(array[observation_date, val])
group by obj_id, val_type_id;

There are two caveats:
a) array requires elements to be of the same type, so observation_type
must be kept as "time from"
b) a row constructor would be ideal here, but there is now max function
for rowtypes.

If I did have a max() function for row types, it would be clean to do this:
select obj_id, val_type_id, max(row(observation_date, val))
group by obj_id, val_type_id;

Now, it seems that since rowtype comparison is built in, it should be
pretty easy to build a max() aggregate for it. Has anybody done this?
I'd have looked at the code for max(anyarray) but I don't know how to
access it. Can someone point me in the right direction?

Also, has someone thought about this before? I'm wondering if there will
be a speed gain coming from this.

Thank you,
Jaime

***********************************************************************
Bear Stearns is not responsible for any recommendation, solicitation,
offer or agreement or any information about any transaction, customer
account or account activity contained in this communication.

Bear Stearns does not provide tax, legal or accounting advice. You
should consult your own tax, legal and accounting advisors before
engaging in any transaction. In order for Bear Stearns to comply with
Internal Revenue Service Circular 230 (if applicable), you are notified
that any discussion of U.S. federal tax issues contained or referred to
herein is not intended or written to be used, and cannot be used, for
the purpose of: (A) avoiding penalties that may be imposed under the
Internal Revenue Code; nor (B) promoting, marketing or recommending to
another party any transaction or matter addressed herein.
***********************************************************************

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Max.Kaufmann 2007-03-30 13:53:56 job opportunity with Hedge Fund - NYC
Previous Message Sorin N. Ciolofan 2007-03-30 13:20:09 Re: ERROR: out of shared memory