Re: Using Postgres to store high volume streams of sensor readings

From: "Ciprian Dorin Craciun" <ciprian(dot)craciun(at)gmail(dot)com>
To: "Michal Szymanski" <dyrex(at)poczta(dot)onet(dot)pl>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Using Postgres to store high volume streams of sensor readings
Date: 2008-11-22 15:16:16
Message-ID: 8e04b5820811220716n46968029mc821c0a2cd5c6207@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Nov 21, 2008 at 3:12 PM, Michal Szymanski <dyrex(at)poczta(dot)onet(dot)pl> wrote:
> On 21 Lis, 13:50, ciprian(dot)crac(dot)(dot)(dot)(at)gmail(dot)com ("Ciprian Dorin Craciun")
> wrote:
>> Hello all!
>>
>> I would like to ask some advice about the following problem
>> (related to the Dehems project:http://www.dehems.eu/):
>> * there are some clients; (the clients are in fact house holds;)
>> * each device has a number of sensors (about 10), and not all the
>> clients have the same sensor; also sensors might appear and disappear
>> dynamicaly; (the sensors are appliances;)
>> * for each device and each sensor a reading is produced (at about
>> 6 seconds); (the values could be power consumptions;)
>> * I would like to store the following data: (client, sensor,
>> timestamp, value);
>> * the usual queries are:
>> * for a given client (and sensor), and time interval, I need
>> the min, max, and avg of the values;
>> * for a given time interval (and sensor), I need min, max, and
>> avg of the values;
>> * other statistics;
>
> How many devices you expect ?
> As I understand number of expected is more or less:
> no.of devices * no.sensors (about 10)
> every 6second. Let assume that you have 100 devices it means 1000
> inserts per 6s = 166 insert for 1 seconds.

Yes, the figures are like this:
* average number of raw inserts / second (without any optimization
or previous aggregation): #clients (~ 100 thousand) * #sensors (~ 10)
/ 6seconds = 166 thousand inserts / second...
* if I use sharding this number vould drop linearly with the
number of Postgres instances... so let's say I use about 10 thousand
users / Postgres instance => 16 thousand inserts / second... (a figure
which I wasn't able to reach in my Postgres benchmarks...)

Either way, I would expect at least 2-3 thousand inserts per second...

>> * inserts are done like this:
>> * generated 100 million readings by using the following rule:
>> * client is randomly chosen between 0 and 10 thousand;
>> * sensor is randomly chosen between 0 and 10;
>> * the timestamp is always increasing by one;
>> * the insert is done in batches of 500 thousand inserts (I've
>> also tried 5, 25, 50 and 100 thousand without big impact);
>> * the banch inserts are done through COPY sds_benchmark_data
>> FROM STDIN through libpq (by using UNIX (local) sockets);
>
>> What have I observed / tried:
>> * I've tested without the primary key and the index, and the
>> results were the best for inserts (600k inserts / s), but the
>> readings, worked extremly slow (due to the lack of indexing);
>> * with only the index (or only the primary key) the insert rate is
>> good at start (for the first 2 million readings), but then drops to
>> about 200 inserts / s;
>
>
> Try periodicaly execute REINDEX your index, and execute ANALYZE for
> your table . To be honest should not influance on inserts but will
> influance on select.

I'll keep this in mind when I'll reach the select part... For the
moment I'm strugling with inserts... (Actually I've kind of given
up...)

> Michal Szymanski
> http://blog.szymanskich.net
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Thanks,
Ciprian Craciun.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ciprian Dorin Craciun 2008-11-22 15:40:27 Fwd: Using Postgres to store high volume streams of sensor readings
Previous Message Magnus Hagander 2008-11-22 14:23:08 Re: Postgres mail list traffic over time