Skip site navigation (1) Skip section navigation (2)

Re: performance for high-volume log insertion

From: James Mansion <james(at)mansionfamily(dot)plus(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: david(at)lang(dot)hm, pgsql-performance(at)postgresql(dot)org
Subject: Re: performance for high-volume log insertion
Date: 2009-04-22 20:53:03
Message-ID: 49EF83AF.2090307@mansionfamily.plus.com (view raw or flat)
Thread:
Lists: pgsql-performance
Stephen Frost wrote:
> You're re-hashing things I've already said.  The big win is batching the
> inserts, however that's done, into fewer transactions.  Sure, multi-row
> inserts could be used to do that, but so could dropping begin/commits in
> right now which probably takes even less effort.
>   
Well, I think you are seriously underestimating the cost of the 
round-trip compared
to all the other effects (possibly bar the commits).  When I tested the 
union insert
technique on SQLServer and Sybase I got measurable improvements going from
100 row statements to 200 row statements, though I suspect in that case the
per-statement overheads are quite high.  I expected improvements from 10 
to 20
row batches, but it carried on getting better for a long time after 
that.  The
Sybase parser runs out of workspace first.


> No, as was pointed out previously already, you really just need 2.  A
>   
And I'm disagreeing with that.  Single row is a given, but I think 
you'll find it pays to have one
round trip if at all possible and invoking multiple prepared statements 
can work against this.

> see if there's really much of a performance difference between a
> 50-insert prepared statement, and 50 1-insert prepared statements.  If
> they're both done in larger transactions, I don't know that there's
> really alot of performance difference.
>   
I think you'll be surprised, but the only way is to test it.  And also 
the simple 50 row single
insert as text.  See if you can measure the difference between that and 
the prepared
statement.
> storage overhead?  indexing overhead?  We're talking about prepared
> statements here, what additional storage requirement do you think those
> would impose?  What additional indexing overhead?  I don't believe we
> actually do anything differently between prepared statements and
> multi-row inserts that would change either of those.
>   
That's my point.  You will brickwall on the actual database operations 
for execution
early enough that the efficiency difference between parse-and-execute 
and prepared
statements will be hard to measure - at least if you have multi-row 
statements.

But this really needs testing and timing.

> Ah, latency is a reasonable thing to bring up.  Of course, if you want
> to talk about latency then you get to consider that multi-insert SQL
> will inherently have larger packet sizes which could cause them to be
> delayed in some QoS arrangements.
>   
No, I mean latency from round trips from the client to the server 
process.  I don't know why
you think I'd mean that.
> As I said, most of this is a re-hash of things already said.  The
> low-hanging fruit here is doing multiple inserts inside of a
> transaction, rather than 1 insert per transaction.  Regardless of how
> that's done, it's going to give the best bang-for-buck.  It will
> complicate the client code some, regardless of how it's implemented, so
> that failures are handled gracefully (if that's something you care about
> anyway), but as there exists some queueing mechanisms in rsyslog
> already, hopefully it won't be too bad.
>   
I think you have largely missed the point. There are two things here:
 1) how many rows per commit
 2) how many rows per logical RPC (ie round trip) between the client
   and server processes

We are agreed that the first is a Very Big Deal, but you seem resistant to
the idea that the second of these is a big deal once you've dealt with 
the former.

My experience has been that its much more important than any benefits of
preparing statements etc, particularly if the use of a prepared 
statement can
make it harder to do multi-row RPCs because the protocol doesn't
allow pipelining (at least without things getting very hairy).

Clearly 'copy' is your friend for this too, at least potentially (even 
if it means
streaming to a staging table).


James


In response to

Responses

pgsql-performance by date

Next:From: Glenn MaynardDate: 2009-04-22 21:04:43
Subject: Re: performance for high-volume log insertion
Previous:From: Tom LaneDate: 2009-04-22 20:49:09
Subject: Re: performance for high-volume log insertion

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group