Quick Links

Re: best db schema for time series data?

From:	Robert Klemme <shortcutter(at)googlemail(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: best db schema for time series data?
Date:	2010-11-20 00:16:23
Message-ID:	AANLkTik5OULt2gK+Z45OMWTOd_kbEPVkMp5fLKTokh2P@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Fri, Nov 19, 2010 at 10:50 AM, Louis-David Mitterrand
<vindex+lists-pgsql-performance(at)apartia(dot)org> wrote:
> On Tue, Nov 16, 2010 at 05:28:19PM +0100, Harald Fuchs wrote:
>> In article <4CE2688B(dot)2050000(at)tweakers(dot)net>,
>> Arjen van der Meijden <acmmailing(at)tweakers(dot)net> writes:
>>
>> > On 16-11-2010 11:50, Louis-David Mitterrand wrote:
>> >> I have to collect lots of prices from web sites and keep track of their
>> >> changes. What is the best option?
>> >>
>> >> 1) one 'price' row per price change:
>> >>
>> >> create table price (
>> >> id_price primary key,
>> >> id_product integer references product,
>> >> price integer
>> >> );
>> >>
>> >> 2) a single 'price' row containing all the changes:
>> >>
>> >> create table price (
>> >> id_price primary key,
>> >> id_product integer references product,
>> >> price integer[] -- prices are 'pushed' on this array as they change
>> >> );
>> >>
>> >> Which is bound to give the best performance, knowing I will often need
>> >> to access the latest and next-to-latest prices?
>>
>> > If you mostly need the last few prices, I'd definitaly go with the
>> > first aproach, its much cleaner. Besides, you can store a date/time
>> > per price, so you know when it changed. With the array-approach that's
>> > a bit harder to do.
>>
>> I'd probably use a variant of this:
>>
>> CREATE TABLE prices (
>> pid int NOT NULL REFERENCES products,
>> validTil timestamp(0) NULL,
>> price int NOT NULL,
>> UNIQUE (pid, validTil)
>> );
>>
>> The current price of a product is always the row with validTil IS NULL.
>> The lookup should be pretty fast because it can use the index of the
>> UNIQUE constraint.

Even better: with a partial index lookup should be more efficient and
probably will stay that way even when the number of prices increases
(and the number of products stays the same). With

CREATE UNIQUE INDEX current_prices
ON prices (
pid
)
WHERE validTil IS NULL;

I get

robert=> explain select price from prices where pid = 12344 and
validTil is null;
QUERY PLAN
-----------------------------------------------------------------------------
Index Scan using current_prices on prices (cost=0.00..8.28 rows=1 width=4)
Index Cond: (pid = 12344)
(2 rows)

The index can actually be used here.

(see attachment)

> The validTil idea is nice, but you have to manage that field with a
> trigger, right?

Well, you don't need to. You can always do

begin;
update prices set validTil = current_timestamp
where pid = 123 and validTil is NULL;
insert into prices values ( 123, null, 94 );
commit;

But with a trigger it would be more convenient of course.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Attachment	Content-Type	Size
price.sql	text/x-sql	532 bytes

In response to

Re: best db schema for time series data? at 2010-11-19 09:50:21 from Louis-David Mitterrand

Responses

Re: best db schema for time series data? at 2010-11-20 16:27:51 from Bob Lunney

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Alvaro Herrera	2010-11-20 01:49:25	Re: autovacuum blocks the operations of other manual vacuum
Previous Message	Chris Browne	2010-11-19 17:13:58	Re: best db schema for time series data?