Skip site navigation (1) Skip section navigation (2)

Re: Auto-clustering?

From: "Pierre C" <lists(at)peufeu(dot)com>
To: pgsql-performance(at)postgresql(dot)org, "Royce Ausburn" <royce(at)inomial(dot)com>
Subject: Re: Auto-clustering?
Date: 2010-12-17 10:20:01
Message-ID: op.vnusbnnqeorkce@apollo13 (view raw or flat)
Thread:
Lists: pgsql-performance
> fc=# explain  analyse select collection, period, tariff, sum(bytesSent),  
> sum(bytesReceived), sum(packets), max(sample), (starttime / 3600) * 3600  
> as startchunk from sample_20101001 where starttime between 1287493200  
> and 1290171599  and collection=128    and ip = '10.9.125.207' group by  
> startchunk, tariff, collection,  
> period;

If CLUSTER locks bother you, and you don't do UPDATEs, you might consider  
doing something like this :

- accumulate the rows in a "recent" table
- every hour, INSERT INTO archive SELECT * FROM recent ORDER BY (your  
cluster fields)
- DELETE FROM recent the rows you just inserted
- VACUUM recent

The cluster in your archive table will not be perfect but at least all  
rows from 1 source in 1 hour will be stored close together. But clustering  
doesn't need to be perfect either, if you get 100x better locality, that's  
already good !

Now, if you have a huge amount of data but never query it with a precision  
exceeding 1 hour, you might consider creating an aggregate table where, at  
the end of every hour, you only store sum(), min(), max() of the data for  
the last hour's data using GROUP BY the fields you want. You could also  
use a trigger, but that would generate a huge amount of UPDATEs.

For the above query you'd do :

INSERT INTO stats_by_hour (columns...) SELECT
collection, ip, period, tariff, sum(bytesSent),
sum(bytesReceived), sum(packets), max(sample), (starttime / 3600) * 3600
as startchunk from sample_20101001 WHERE starttime > some value
GROUP BY collection, ip, period, tariff, startchunk

Then you can run aggregates against this much smaller table instead.

In response to

Responses

pgsql-performance by date

Next:From: selvi88Date: 2010-12-17 10:48:42
Subject: Re: postgres performance tunning
Previous:From: Filip RembiałkowskiDate: 2010-12-17 10:19:36
Subject: Re: Auto-clustering?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group