Re: Using Postgres to store high volume streams of sensor readings

From: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
To: "Ciprian Dorin Craciun" <ciprian(dot)craciun(at)gmail(dot)com>
Cc: m_lists(at)yahoo(dot)it, pgsql-general(at)postgresql(dot)org
Subject: Re: Using Postgres to store high volume streams of sensor readings
Date: 2008-11-23 18:23:55
Message-ID: dcc563d10811231023g4d9c91b2yebdec169ff2f2b4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sun, Nov 23, 2008 at 10:01 AM, Ciprian Dorin Craciun
<ciprian(dot)craciun(at)gmail(dot)com> wrote:
> On Sun, Nov 23, 2008 at 3:28 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>> * Ciprian Dorin Craciun (ciprian(dot)craciun(at)gmail(dot)com) wrote:
>>> > Even better might be partitioning on the timestamp. IF all access is
>>> > in a certain timestamp range it's usually a big win, especially
>>> > because he can move to a new table every hour / day / week or whatever
>>> > and merge the old one into a big "old data" table.
>>>
>>> Yes, If i would speed the inserts tremendously... I've tested it
>>> and the insert speed is somewhere at 200k->100k.
>>>
>>> But unfortunately the query speed is not good at all because most
>>> queries are for a specific client (and sensor) in a given time
>>> range...
>>
>> Have you set up your partitions correctly (eg, with appropriate CHECK
>> constraints and with constraint_exclusion turned on)? Also, you'd want
>> to keep your indexes on the individual partitions, of course.. That
>> should improve query time quite a bit since it should only be hitting
>> the partitions where the data might be.
>>
>> Stephen
>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.9 (GNU/Linux)
>>
>> iEYEARECAAYFAkkpWpEACgkQrzgMPqB3kihvyACgm6ITdkodTqZvDLCjqavj9lkR
>> w1oAnRrB1rbW+bF6Spr77VcH5/Mty4S6
>> =G7aX
>> -----END PGP SIGNATURE-----
>
> Well, now that I've read the previous two emails better, I
> understand what Scot and Stephen are talking about...
>
> So if I understood it correctly: I should build indexes only for
> certain parts of the data (like previous full hour and so). But I see
> a problem: wouldn't this lead to a lot of indices beeing created (24 /
> hour, ~150 / week, ...)?

No, not exactly what I'm talking about. I'm talking about
pre-creating partitions that the data will soon go into (let's say a
new one every hour) with indexes in place, and having a trigger that
fires on insert to put the data into the right partition. Once that
partition is no longer being inserted into, and we aren't running a
bunch of queries on it, we migrate it to a historical partition.

So, your table looks something like this all the time:

|**|^^|##|##|##|##|$$$$$$$$$$$$$$...|

Where:

** is a partition we have created in advance of needing it.
^^ is the partition we are currently writing to
## are the partitions we're still using in select queries a lot
$$$... are the old data stuffed into the monolithic history table.

When it's time to switch to writing to the new partition (i.e. **) we
make a new one ahead of that, and the trigger starts writing to what
was a ** partition but is now the new ^^, and the ^^ becomes a ##. At
the end of the day / week whatever, we take all the old ## partitions
and move their data into the $$$ and drop the ## partitions.

Note that we only need to put data into an archive partition to keep
from having hundreds or thousands of partitions. There's a limit of a
few hundred partitions where things start getting slow again due to
planner overhead.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Scott Marlowe 2008-11-23 19:08:30 Re: [Q]updating multiple rows with Different values
Previous Message Scara Maccai 2008-11-23 18:10:36 Re: Using Postgres to store high volume streams of sensor readings