Skip site navigation (1) Skip section navigation (2)

Re: performance for high-volume log insertion

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: david(at)lang(dot)hm
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: performance for high-volume log insertion
Date: 2009-04-21 01:55:15
Message-ID: 20090421015515.GR8123@tamriel.snowman.net (view raw or flat)
Thread:
Lists: pgsql-performance
David,

* david(at)lang(dot)hm (david(at)lang(dot)hm) wrote:
> I am working with the rsyslog developers to improve it's performance in  
> inserting log messages to databases.

Great!

> currently they have a postgres interface that works like all the other  
> ones, where rsyslog formats an insert statement, passes that the the  
> interface module, that sends it to postgres (yes, each log as a seperate  
> transaction)

Ouch.

> the big win is going to be in changing the core of rsyslog so that it can 
> process multiple messages at a time (bundling them into a single  
> transaction)

Yup.

> 1. begin; insert; insert;...;end

Doing the insert in a transaction should definitely improve your
performance.  Doing them as prepared statements would be good too, and
using binary mode would very likely help.

> 2. insert into table values (),(),(),()

Using this structure would be more database agnostic, but won't perform
as well as the COPY options I don't believe.  It might be interesting to
do a large "insert into table values (),(),()" as a prepared statement,
but then you'd have to have different sizes for each different number of
items you want inserted.

> 3. copy from stdin
>   (how do you tell it how many records to read from stdin, or that you  
> have given it everything without disconnecting)

Assuming you're using libpq, you just call PQputCopyEnd().  Then you
call PQgetResult() to check that everything worked ok.

> 4. copy from stdin in binary mode

Binary mode, in general, should be faster.  You should consider what
format the data is inside your application though (it's less useful to
use binary copy if you're having to convert from text to binary in your
application).

> and each of the options above can be done with prepared statements, 
> stored procedures, or functions.
>
> I know that using procedures or functions can let you do fancy things 
> like inserting the row(s) into the appropriate section of a partitioned 
> table

We would normally recommend having the database handle the partitioning
by using a trigger on the base table to call a stored procedure.  The
application really doesn't need to know about this.

> other than this sort of capability, what sort of differences should be  
> expected between the various approaches (including prepared statements vs 
> unprepared)
>
> since the changes that rsyslog is making will affect all the other  
> database interfaces as well, any comments about big wins or things to  
> avoid for other databases would be appriciated.

Hope this helps.

	Thanks,

		Stephen

In response to

Responses

pgsql-performance by date

Next:From: davidDate: 2009-04-21 02:24:22
Subject: Re: performance for high-volume log insertion
Previous:From: davidDate: 2009-04-20 21:53:21
Subject: performance for high-volume log insertion

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group