Skip site navigation (1) Skip section navigation (2)

Re: performance for high-volume log insertion

From: david(at)lang(dot)hm
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: performance for high-volume log insertion
Date: 2009-04-21 02:24:22
Message-ID: alpine.DEB.1.10.0904201859460.12662@asgard.lang.hm (view raw or flat)
Thread:
Lists: pgsql-performance
On Mon, 20 Apr 2009, Stephen Frost wrote:

> David,
>
> * david(at)lang(dot)hm (david(at)lang(dot)hm) wrote:
>> I am working with the rsyslog developers to improve it's performance in
>> inserting log messages to databases.
>
> Great!
>
>> currently they have a postgres interface that works like all the other
>> ones, where rsyslog formats an insert statement, passes that the the
>> interface module, that sends it to postgres (yes, each log as a seperate
>> transaction)
>
> Ouch.

yep

>> the big win is going to be in changing the core of rsyslog so that it can
>> process multiple messages at a time (bundling them into a single
>> transaction)
>
> Yup.
>
>> 1. begin; insert; insert;...;end
>
> Doing the insert in a transaction should definitely improve your
> performance.  Doing them as prepared statements would be good too, and
> using binary mode would very likely help.
>
>> 2. insert into table values (),(),(),()
>
> Using this structure would be more database agnostic, but won't perform
> as well as the COPY options I don't believe.  It might be interesting to
> do a large "insert into table values (),(),()" as a prepared statement,
> but then you'd have to have different sizes for each different number of
> items you want inserted.

on the other hand, when you have a full queue (lots of stuff to insert) is 
when you need the performance the most. if it's enough of a win on the 
database side, it could be worth more effort on the applicaiton side.

>> 3. copy from stdin
>>   (how do you tell it how many records to read from stdin, or that you
>> have given it everything without disconnecting)
>
> Assuming you're using libpq, you just call PQputCopyEnd().  Then you
> call PQgetResult() to check that everything worked ok.

one of the big questions is what value we will get by making things 
database specififc (more below)

>> 4. copy from stdin in binary mode
>
> Binary mode, in general, should be faster.  You should consider what
> format the data is inside your application though (it's less useful to
> use binary copy if you're having to convert from text to binary in your
> application).

any idea what sort of difference binary mode would result in?

>> and each of the options above can be done with prepared statements,
>> stored procedures, or functions.
>>
>> I know that using procedures or functions can let you do fancy things
>> like inserting the row(s) into the appropriate section of a partitioned
>> table
>
> We would normally recommend having the database handle the partitioning
> by using a trigger on the base table to call a stored procedure.  The
> application really doesn't need to know about this.

well, the trigger or stored procedure/funcion can be part of the database 
config, or loaded by the app when it starts.



one very big question is how much of a gain there is in moving from a 
database agnostic approach to a database specific approach.

currently rsyslog makes use of it's extensive formatting capabilities to 
format a string along the lines of
$DBformat="insert into table X values ('$timestamp','$msg');"
then it hands the resulting string to the database interface module. This 
is a bit of a pain to setup, and not especially efficiant, but it has the 
ability to insert data into whatever schema you want to use (unlike a lot 
of apps that try to force you to use their schema)

I proposed a 5 variable replacement for this to allow for N log entries to 
be combined into one string to be sent to the database:

DBinit (one-time things like initialinzing prepared statements, etc)
DBstart (string for the start of a transaction)
DBjoin (tring to use to join multiple DBitems togeather)
DBend (string for the end of a transaction)
DBitem (formatting of a single action )

so you could do something like

DBstart = "insert into table X values"
DBjoin = ","
DBend = ";"
DBitem = "('$timestampe','$msg')"

and it would create a string like #2

this is extremely flexible. I think it can do everything except binary 
mode operations, including copy. It is also pretty database agnostic.

but people are asking about how to do binary mode, and some were thinking 
that you couldn't do prepared statements in Oracle with a string-based 
interface.

so I decided to post here to try and get an idea of (1) how much 
performance would be lost by sticking with strings, and (2) of all the 
various ways of inserting the data, what sort of performance differences 
are we talking about

David Lang

In response to

Responses

pgsql-performance by date

Next:From: Stephen FrostDate: 2009-04-21 02:44:58
Subject: Re: performance for high-volume log insertion
Previous:From: Stephen FrostDate: 2009-04-21 01:55:15
Subject: Re: performance for high-volume log insertion

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group