Re: Best way to handle multi-billion row read-only table?

From: John R Pierce <pierce(at)hogranch(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Best way to handle multi-billion row read-only table?
Date: 2010-02-09 22:08:09
Message-ID: 4B71DCC9.8000402@hogranch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Asher Hoskins wrote:
> If I partition so that each partition holds data for a single channel
> (and set a CHECK constraint for this) then I can presumably remove the
> channel from the index since constraint exclusion will mean that only
> partitions holding the channel I'm interested in will be searched in a
> query. Given that within a partition all of my sample_time's will be
> different do you know if there's a more efficient way to index these?

how do you plan on accessing this monster data? do you expect to be
looking up single values or small set of values at a specific time?

seems to me like this is the sort of data thats more often processed in
the aggregate, like running a fourier analysis of sliding windows, and
that sort of data processing may well be more efficiently done with
fixed block binary files rather than relational databases, as there's no
real relationships in this data.

for instance, a directory for each sensor, with a directory for each
week, and a file for each hour, containing the hours worth of samples in
fixed binary blocks after a file header identifying it. you can random
access a specific time sample by using fseek
(sampletime-starttimeofblock) * blocksize + headersize or whatever.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Guillaume Lelarge 2010-02-09 22:39:10 Re: viewing large queries in pg_stat_activity
Previous Message Boszormenyi Zoltan 2010-02-09 22:05:55 Re: ERROR: could not load library "...": Exec format error