Skip site navigation (1) Skip section navigation (2)

Re: [HACKERS] Postgresql performace question

From: Greg Copeland <greg(at)CopelandConsulting(dot)Net>
To: Mark Jones <markj(at)hackerjones(dot)org>
Cc: Rod Taylor <rbt(at)rbt(dot)ca>, Mark Jones <mlist(at)hackerjones(dot)org>,PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>,PostgresSQL General Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Re: [HACKERS] Postgresql performace question
Date: 2003-03-03 21:54:50
Message-ID: 1046728489.3302.17.camel@mouse.copelandconsulting.net (view raw or flat)
Thread:
Lists: pgsql-generalpgsql-hackers
Something I didn't see mentioned of, does your data need to be made
available at real time?  Just because you're sampling 20/s doesn't have
to mean that the data is made available at 20 samples per second or even
1 sample per 1/20th of a second.  I mention this because you might find
that it's a little more resource friendly to stuff many samples into the
same transaction and commit them together.  As an example, take
60-seconds (1200 samples) worth of data and send it to the database
while continuing to sample your data.  I mention this because it may
allow for more flexibility in your implementation.

At any rate, I agree with the other comments.  Maintenance issues are
probably going to be the gotcha if you're not forward looking.


Regards,

	Greg Copeland


On Sun, 2003-03-02 at 19:33, Mark Jones wrote:
> > The real question is, the data collection is in real-time, but can you
> > have a maintenance window (6 hours a week?) to do things like REINDEX?
> 
> The database has to have the last 24 hours of data online and be acessable 24 
> hours a day 7 days a week 365 days a year basicly no downtime. My application 
> is weather related system, which must be avaliable at any moment.
> 
> > Will the data purge be spread over time or come in large lumps?
> 
>  The database that is online would only contain the last 24 hours of data.
> Once 24 hours of data has been collected the data would begin to be purged say
>  purging of 1 hours worth of data every hour.
> 
> > But if it's 20 packets with parts being inserted into 50 tables (now
> > 1000 inserts / second) things could become a little trickier -- but
> > still easily doable.
> 
> No there will be only one table with each packet having its own record.
> Each packet contains a few header fields that I would like to be able to 
> search on. The rest of the data is purely binary data that goes along with 
> the header. I no intertest in searching through it only being able to retrive 
> it based on its header. One important way of retriving data would be based on 
> time saying that every packet has a timestamp, I would like to select a range 
> of records out of the database. There maybe cases were I want to stream the 
> data back out just the way as it was recorded.
> 
> 
> > Can you use a parallel loader or does it need to be loaded sequentially?
> 
> I am not exactly sure what you mean by parallel loader but the data would be 
> loaded sequentially into the database just as I receive it from my external 
> device. As mentioned above I may want to be able to sequentially read records 
> from the database to be able to recreate the data stream that I received and 
> recored. I would need to be able to read the records back fast enough to 
> replay the at the same rate at which I received them.
> 
> 
> Thanks Again
> Mark
> 
> 
> On Sunday 02 March 2003 06:35 pm, Rod Taylor wrote:
> > On Sun, 2003-03-02 at 18:52, Mark Jones wrote:
> > > Hello
> > >
> > > I am working on a project that acquires real-time data from an external
> > > device that I need to store and be able to search through and retrieve
> > > quickly. My application receives packets of data ranging in size from 300
> > > to 5000 bytes every 50 milliseconds for the minimum duration of 24 hours
> > > before the data is purged or archived off disk. There are several fields
> > > in the data that I like to be able to search on to retrieve the data at
> > > later time. By using a SQL database such as Postgresql or Mysql it seams
> > > that it would make this task much easier. My questions are, is a SQL
> > > database such as Postgresql able to handle this kind of activity saving a
> > > record of 5000 bytes at rate of 20 times a second, also how well will it
> > > perform at searching through a database which contains nearly two million
> > > records at a size of about 8 - 9 gigabytes of data, assuming that I have
> > > adequate computing hardware. I am trying to determine if a SQL database
> > > would work well for this or if I need to write my own custom database for
> > > this project. If anyone has any experience in doing anything similar with
> > > Postgresql  I would love to know about your findings.
> >
> > Not such a big load depending on the complexity of the queries being
> > performed.  From the sounds of it, they're pretty simple.
> >
> > The real question is, the data collection is in real-time, but can you
> > have a maintenance window (6 hours a week?) to do things like REINDEX?
> >
> > Can you use a parallel loader or does it need to be loaded sequentially?
> >
> > Will the data purge be spread over time or come in large lumps?
> >
> >
> > Simple queries on 2 million records happen in low double digit
> > milliseconds on relatively low end hardware with a decent index -- but
> > complex queries can take minutes on 1/10000th the dataset.
> >
> > 20 inserts / second shouldn't be difficult to achieve even on the lowest
> > end hardware -- my laptop can do that -- giving each packet it's own
> > transaction.
> >
> > But if it's 20 packets with parts being inserted into 50 tables (now
> > 1000 inserts / second) things could become a little trickier -- but
> > still easily doable.
> >
> > The most important question is the maintenance window.
-- 
Greg Copeland <greg(at)copelandconsulting(dot)net>
Copeland Computer Consulting


In response to

pgsql-hackers by date

Next:From: cbbrowneDate: 2003-03-03 23:57:26
Subject: Re: [PATCHES] XML ouput for psql
Previous:From: Mark WongDate: 2003-03-03 21:20:26
Subject: Re: Yet another open-source benchmark

pgsql-general by date

Next:From: Patrick Bye (WFF)Date: 2003-03-03 22:20:06
Subject: Help with functions
Previous:From: Munteanu GabrielDate: 2003-03-03 21:46:39
Subject: Re: wal-debug

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group