Skip site navigation (1) Skip section navigation (2)

Re: Postgresql performace question

From: Mark Jones <markj(at)hackerjones(dot)org>
To: Rod Taylor <rbt(at)rbt(dot)ca>, Mark Jones <mlist(at)hackerjones(dot)org>
Cc: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>,pgsql-general(at)postgresql(dot)org
Subject: Re: Postgresql performace question
Date: 2003-03-03 01:33:55
Message-ID: 200303021933.55642.markj@hackerjones.org (view raw or flat)
Thread:
Lists: pgsql-generalpgsql-hackers
> The real question is, the data collection is in real-time, but can you
> have a maintenance window (6 hours a week?) to do things like REINDEX?

The database has to have the last 24 hours of data online and be acessable 24 
hours a day 7 days a week 365 days a year basicly no downtime. My application 
is weather related system, which must be avaliable at any moment.

> Will the data purge be spread over time or come in large lumps?

 The database that is online would only contain the last 24 hours of data.
Once 24 hours of data has been collected the data would begin to be purged say
 purging of 1 hours worth of data every hour.

> But if it's 20 packets with parts being inserted into 50 tables (now
> 1000 inserts / second) things could become a little trickier -- but
> still easily doable.

No there will be only one table with each packet having its own record.
Each packet contains a few header fields that I would like to be able to 
search on. The rest of the data is purely binary data that goes along with 
the header. I no intertest in searching through it only being able to retrive 
it based on its header. One important way of retriving data would be based on 
time saying that every packet has a timestamp, I would like to select a range 
of records out of the database. There maybe cases were I want to stream the 
data back out just the way as it was recorded.


> Can you use a parallel loader or does it need to be loaded sequentially?

I am not exactly sure what you mean by parallel loader but the data would be 
loaded sequentially into the database just as I receive it from my external 
device. As mentioned above I may want to be able to sequentially read records 
from the database to be able to recreate the data stream that I received and 
recored. I would need to be able to read the records back fast enough to 
replay the at the same rate at which I received them.


Thanks Again
Mark


On Sunday 02 March 2003 06:35 pm, Rod Taylor wrote:
> On Sun, 2003-03-02 at 18:52, Mark Jones wrote:
> > Hello
> >
> > I am working on a project that acquires real-time data from an external
> > device that I need to store and be able to search through and retrieve
> > quickly. My application receives packets of data ranging in size from 300
> > to 5000 bytes every 50 milliseconds for the minimum duration of 24 hours
> > before the data is purged or archived off disk. There are several fields
> > in the data that I like to be able to search on to retrieve the data at
> > later time. By using a SQL database such as Postgresql or Mysql it seams
> > that it would make this task much easier. My questions are, is a SQL
> > database such as Postgresql able to handle this kind of activity saving a
> > record of 5000 bytes at rate of 20 times a second, also how well will it
> > perform at searching through a database which contains nearly two million
> > records at a size of about 8 - 9 gigabytes of data, assuming that I have
> > adequate computing hardware. I am trying to determine if a SQL database
> > would work well for this or if I need to write my own custom database for
> > this project. If anyone has any experience in doing anything similar with
> > Postgresql  I would love to know about your findings.
>
> Not such a big load depending on the complexity of the queries being
> performed.  From the sounds of it, they're pretty simple.
>
> The real question is, the data collection is in real-time, but can you
> have a maintenance window (6 hours a week?) to do things like REINDEX?
>
> Can you use a parallel loader or does it need to be loaded sequentially?
>
> Will the data purge be spread over time or come in large lumps?
>
>
> Simple queries on 2 million records happen in low double digit
> milliseconds on relatively low end hardware with a decent index -- but
> complex queries can take minutes on 1/10000th the dataset.
>
> 20 inserts / second shouldn't be difficult to achieve even on the lowest
> end hardware -- my laptop can do that -- giving each packet it's own
> transaction.
>
> But if it's 20 packets with parts being inserted into 50 tables (now
> 1000 inserts / second) things could become a little trickier -- but
> still easily doable.
>
> The most important question is the maintenance window.

-- 
Mark Jones
http://www.hackerjones.org



In response to

Responses

pgsql-hackers by date

Next:From: Itai ZukermanDate: 2003-03-03 01:38:07
Subject: Re: GiST: Bad newtup On Exit From gistSplit() ?
Previous:From: Rod TaylorDate: 2003-03-03 00:35:45
Subject: Re: Postgresql performace question

pgsql-general by date

Next:From: Dennis GearonDate: 2003-03-03 01:45:05
Subject: Re: function problem plpgsql
Previous:From: Tom LaneDate: 2003-03-03 01:27:39
Subject: Re: difference between overlap and intersect using geometric types of postgresql

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group