Re: [HACKERS] Postgresql performace question

From: Greg Copeland <greg(at)CopelandConsulting(dot)Net>
To: Mark Jones <markj(at)hackerjones(dot)org>
Cc: Rod Taylor <rbt(at)rbt(dot)ca>, Mark Jones <mlist(at)hackerjones(dot)org>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>, PostgresSQL General Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Re: [HACKERS] Postgresql performace question
Date: 2003-03-03 21:54:50
Message-ID: 1046728489.3302.17.camel@mouse.copelandconsulting.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Something I didn't see mentioned of, does your data need to be made
available at real time? Just because you're sampling 20/s doesn't have
to mean that the data is made available at 20 samples per second or even
1 sample per 1/20th of a second. I mention this because you might find
that it's a little more resource friendly to stuff many samples into the
same transaction and commit them together. As an example, take
60-seconds (1200 samples) worth of data and send it to the database
while continuing to sample your data. I mention this because it may
allow for more flexibility in your implementation.

At any rate, I agree with the other comments. Maintenance issues are
probably going to be the gotcha if you're not forward looking.

Regards,

Greg Copeland

On Sun, 2003-03-02 at 19:33, Mark Jones wrote:
> > The real question is, the data collection is in real-time, but can you
> > have a maintenance window (6 hours a week?) to do things like REINDEX?
>
> The database has to have the last 24 hours of data online and be acessable 24
> hours a day 7 days a week 365 days a year basicly no downtime. My application
> is weather related system, which must be avaliable at any moment.
>
> > Will the data purge be spread over time or come in large lumps?
>
> The database that is online would only contain the last 24 hours of data.
> Once 24 hours of data has been collected the data would begin to be purged say
> purging of 1 hours worth of data every hour.
>
> > But if it's 20 packets with parts being inserted into 50 tables (now
> > 1000 inserts / second) things could become a little trickier -- but
> > still easily doable.
>
> No there will be only one table with each packet having its own record.
> Each packet contains a few header fields that I would like to be able to
> search on. The rest of the data is purely binary data that goes along with
> the header. I no intertest in searching through it only being able to retrive
> it based on its header. One important way of retriving data would be based on
> time saying that every packet has a timestamp, I would like to select a range
> of records out of the database. There maybe cases were I want to stream the
> data back out just the way as it was recorded.
>
>
> > Can you use a parallel loader or does it need to be loaded sequentially?
>
> I am not exactly sure what you mean by parallel loader but the data would be
> loaded sequentially into the database just as I receive it from my external
> device. As mentioned above I may want to be able to sequentially read records
> from the database to be able to recreate the data stream that I received and
> recored. I would need to be able to read the records back fast enough to
> replay the at the same rate at which I received them.
>
>
> Thanks Again
> Mark
>
>
> On Sunday 02 March 2003 06:35 pm, Rod Taylor wrote:
> > On Sun, 2003-03-02 at 18:52, Mark Jones wrote:
> > > Hello
> > >
> > > I am working on a project that acquires real-time data from an external
> > > device that I need to store and be able to search through and retrieve
> > > quickly. My application receives packets of data ranging in size from 300
> > > to 5000 bytes every 50 milliseconds for the minimum duration of 24 hours
> > > before the data is purged or archived off disk. There are several fields
> > > in the data that I like to be able to search on to retrieve the data at
> > > later time. By using a SQL database such as Postgresql or Mysql it seams
> > > that it would make this task much easier. My questions are, is a SQL
> > > database such as Postgresql able to handle this kind of activity saving a
> > > record of 5000 bytes at rate of 20 times a second, also how well will it
> > > perform at searching through a database which contains nearly two million
> > > records at a size of about 8 - 9 gigabytes of data, assuming that I have
> > > adequate computing hardware. I am trying to determine if a SQL database
> > > would work well for this or if I need to write my own custom database for
> > > this project. If anyone has any experience in doing anything similar with
> > > Postgresql I would love to know about your findings.
> >
> > Not such a big load depending on the complexity of the queries being
> > performed. From the sounds of it, they're pretty simple.
> >
> > The real question is, the data collection is in real-time, but can you
> > have a maintenance window (6 hours a week?) to do things like REINDEX?
> >
> > Can you use a parallel loader or does it need to be loaded sequentially?
> >
> > Will the data purge be spread over time or come in large lumps?
> >
> >
> > Simple queries on 2 million records happen in low double digit
> > milliseconds on relatively low end hardware with a decent index -- but
> > complex queries can take minutes on 1/10000th the dataset.
> >
> > 20 inserts / second shouldn't be difficult to achieve even on the lowest
> > end hardware -- my laptop can do that -- giving each packet it's own
> > transaction.
> >
> > But if it's 20 packets with parts being inserted into 50 tables (now
> > 1000 inserts / second) things could become a little trickier -- but
> > still easily doable.
> >
> > The most important question is the maintenance window.
--
Greg Copeland <greg(at)copelandconsulting(dot)net>
Copeland Computer Consulting

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Patrick Bye (WFF) 2003-03-03 22:20:06 Help with functions
Previous Message Munteanu Gabriel 2003-03-03 21:46:39 Re: wal-debug

Browse pgsql-hackers by date

  From Date Subject
Next Message cbbrowne 2003-03-03 23:57:26 Re: [PATCHES] XML ouput for psql
Previous Message Mark Wong 2003-03-03 21:20:26 Re: Yet another open-source benchmark