Re: Best way to handle multi-billion row read-only table?

From: Asher <asher(at)piceur(dot)co(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Best way to handle multi-billion row read-only table?
Date: 2010-02-10 14:32:48
Message-ID: 4B72C390.1040004@piceur.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

John R Pierce wrote:
> how do you plan on accessing this monster data? do you expect to be
> looking up single values or small set of values at a specific time?
> seems to me like this is the sort of data thats more often processed in
> the aggregate, like running a fourier analysis of sliding windows, and
> that sort of data processing may well be more efficiently done with
> fixed block binary files rather than relational databases, as there's no
> real relationships in this data.

The data will initially be accessed via a simple GUI which will allow
browsing over a subset of the data (subsampled down to 1
sample/minute/hour, etc. during the data load phase and so massively
smaller datasets) and then once something "interesting" has been found
manually (fully automatic detection of transients has to wait until
we've formally described what we mean by "transient" :-)) the start and
end times can be handed over to our automatic processing code to go
through the full dataset.

I did consider just sticking the data into a series of big dumb files
but by putting them in a DB I can both maintain automatic links between
the full and subsampled data sets and between each data point and the
equipment that measured it and, possibly more importantly, I can provide
a simpler interface to the other people on my project to access the
data. I'm a computer scientist but I'm doing my PhD in the Civil
Engineering dept and all of my colleagues are civil engineers - all
quite happy using Matlab's database plugin but less happy writing
traditional code to crunch through raw files. I'm aware that I'm taking
a, possibly quite large, performance hit by using a database but I'm
hoping that the advantages will outweigh this.

Many thanks for all the replies to my query. I'm going to go with a
partitioned table design and start uploading some data. I'll post how it
performs once I've got some real size data in it.

Asher.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ben Campbell 2010-02-10 14:49:23 Re: problems maintaining boolean columns in a large table
Previous Message Pavel Stehule 2010-02-10 14:31:07 Re: Orafce concat operator