Re: best db schema for time series data?

From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: best db schema for time series data?
Date: 2010-11-16 16:35:24
Message-ID: 8762vxfcqb.fsf@cbbrowne.afilias-int.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

vindex+lists-pgsql-performance(at)apartia(dot)org (Louis-David Mitterrand)
writes:
> I have to collect lots of prices from web sites and keep track of their
> changes. What is the best option?
>
> 1) one 'price' row per price change:
>
> create table price (
> id_price primary key,
> id_product integer references product,
> price integer
> );
>
> 2) a single 'price' row containing all the changes:
>
> create table price (
> id_price primary key,
> id_product integer references product,
> price integer[] -- prices are 'pushed' on this array as they change
> );
>
> Which is bound to give the best performance, knowing I will often need
> to access the latest and next-to-latest prices?

I'd definitely bias towards #1, but with a bit of a change...

create table product (
id_product serial primary key
);

create table price (
id_product integer references product,
as_at timestamptz default now(),
primary key (id_product, as_at),
price integer
);

The query to get the last 5 prices for a product should be
splendidly efficient:

select price, as_at from price
where id_product = 17
order by as_at desc limit 5;

(That'll use the PK index perfectly nicely.)

If you needed higher performance, for "latest price," then I'd add a
secondary table, and use triggers to copy latest price into place:

create table latest_prices (
id_product integer primary key references product,
price integer
);

create or replace function capture_latest_price () returns trigger as $$
declare
begin
delete from latest_prices where id_product = NEW.id_product;
insert into latest_prices (id_product,price) values
(NEW.id_product, NEW.price);
return NEW;
end
$$ language plpgsql;

create trigger price_capture after insert on price execute procedure capture_latest_price();

This captures *just* the latest price for each product. (There's a bit
of race condition - if there are two concurrent price updates, one will
fail, which wouldn't happen without this trigger in place.)
--
"... Turns out that JPG was in fact using his brain... and I am
inclined to encourage him to continue the practice even if it isn't
exactly what I would have done myself." -- Alan Bawden (way out of
context)

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Smith 2010-11-16 20:39:20 Re: Defaulting wal_sync_method to fdatasync on Linux for 9.1?
Previous Message Harald Fuchs 2010-11-16 16:28:19 Re: best db schema for time series data?