Re: High Frequency Inserts to Postgres Database vs Writing to a File

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Jay Manni <JManni(at)fireeye(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: High Frequency Inserts to Postgres Database vs Writing to a File
Date: 2009-11-04 13:59:25
Message-ID: b42b73150911040559w660d1ad2w38a014fbc34e877f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

n Tue, Nov 3, 2009 at 10:12 PM, Jay Manni <JManni(at)fireeye(dot)com> wrote:
> Hi:
>
> I have an application wherein a process needs to read data from a stream and
> store the records for further analysis and reporting. The data in the stream
> is in the form of variable length records with clearly defined fields – so
> it can be stored in a database or in a file. The only caveat is that the
> rate of records coming in the stream could be several 1000 records a second.

Postgres doing this is going to depend on primarily two things:
*) Your hardware
*) The mechanism you use to insert the data into the database

Postgres can handle multiple 1000 insert/sec but your hardware most
likely can't handle multiple 1000 transaction/sec if fsync is on. You
definitely want to batch the insert into the database somehow, so that
something accumulates the data (could be a simple file), and flushes
it in to the database. The 'flush' ideally should use copy but
multiple row insert is ok too. Try to avoid inserting one row at a
time even if in a transaction.

If you are bulk inserting 1000+ records/sec all day long, make sure
you have provisioned enough storage for this (that's 86M records/day),
and you should immediately start thinking about partitioning and
rotating the log table (if you log to the database, partition/rotate
is basically already baked in anyways).

The effects on other users of the database are really hard to predict
-- it's going to depend on how much resources you have (cpu and
especially disk) to direct towards the loading and how the database is
being used. I expect it shouldn't be too bad unless your dataase is
already i/o loaded. The good news is testing this is relatively easy
you can simulate a load test and just run it during typical use and
see how it affects other users. Standard o/s tools (iostat, top), and
database log with min_duration_statement are going to be a big help
here. If you start seeing big leaps in iowait corresponding with
unexpectedly lagging queries in your app , you probably should think
about scrapping the idea.

merlin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Smith 2009-11-04 14:28:34 Re: High Frequency Inserts to Postgres Database vs Writing to a File
Previous Message Ivan Voras 2009-11-04 10:35:23 Re: Free memory usage Sol10, 8.2.9