Skip site navigation (1) Skip section navigation (2)

Re: best db schema for time series data?

From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: best db schema for time series data?
Date: 2010-11-16 16:35:24
Message-ID: 8762vxfcqb.fsf@cbbrowne.afilias-int.info (view raw or flat)
Thread:
Lists: pgsql-performance
vindex+lists-pgsql-performance(at)apartia(dot)org (Louis-David Mitterrand)
writes:
> I have to collect lots of prices from web sites and keep track of their
> changes. What is the best option?
>
> 1) one 'price' row per price change:
>
> 	create table price (
> 		id_price primary key,
> 		id_product integer references product,
> 		price integer
> 	);
>
> 2) a single 'price' row containing all the changes:
>
> 	create table price (
> 		id_price primary key,
> 		id_product integer references product,
> 		price integer[] -- prices are 'pushed' on this array as they change
> 	);
>
> Which is bound to give the best performance, knowing I will often need
> to access the latest and next-to-latest prices?

I'd definitely bias towards #1, but with a bit of a change...

create table product (
  id_product serial primary key
);

create table price (
   id_product integer references product,
   as_at timestamptz default now(),
   primary key (id_product, as_at),
   price integer
);

The query to get the last 5 prices for a product should be
splendidly efficient:

   select price, as_at from price
    where id_product = 17
    order by as_at desc limit 5;

(That'll use the PK index perfectly nicely.)

If you needed higher performance, for "latest price," then I'd add a
secondary table, and use triggers to copy latest price into place:

  create table latest_prices (
     id_product integer primary key references product,
     price integer
  );

create or replace function capture_latest_price () returns trigger as $$
declare
begin
	delete from latest_prices where id_product = NEW.id_product;
	insert into latest_prices (id_product,price) values
	   (NEW.id_product, NEW.price);
	return NEW;
end
$$ language plpgsql;

create trigger price_capture after insert on price execute procedure capture_latest_price();

This captures *just* the latest price for each product.  (There's a bit
of race condition - if there are two concurrent price updates, one will
fail, which wouldn't happen without this trigger in place.)
--
"... Turns   out that JPG  was in  fact using his  brain... and   I am
inclined to encourage him  to continue the practice  even if  it isn't
exactly what I  would have done myself."   -- Alan Bawden  (way out of
context)

In response to

Responses

pgsql-performance by date

Next:From: Greg SmithDate: 2010-11-16 20:39:20
Subject: Re: Defaulting wal_sync_method to fdatasync on Linux for 9.1?
Previous:From: Harald FuchsDate: 2010-11-16 16:28:19
Subject: Re: best db schema for time series data?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group