Skip site navigation (1) Skip section navigation (2)

Re: Poor addBatch performance. Why dosn't it use copy ?

From: Kris Jurka <books(at)ejurka(dot)com>
To: femski <hypertree(at)yahoo(dot)com>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: Poor addBatch performance. Why dosn't it use copy ?
Date: 2007-04-20 15:05:23
Message-ID: Pine.BSO.4.64.0704201053200.26355@leary.csoft.net (view raw or flat)
Thread:
Lists: pgsql-jdbc

On Fri, 20 Apr 2007, femski wrote:

> I took this discussion in Postgres performance  list and came out with
> conclusion that its a
> client side JDBC issue - so I am psting it here.
>
> I have a batch application that writes approx. 4 million rows into a narrow
> (2 column) table. I am using JDBC addBatch/ExecuteBatch with auto commit
> turned off. Batch size is 1000. So far I am seeing Postgres take
> roughly five times (280 sec) the time it takes to do this in the Oracle
> (60). This is on a Linux
> server with Xeon woodcrest 5310 process. Plenty of memory. I have played
> with many parameters on
> the server side and they seem to have little effect - I am sure Postgres is
> a very capable server and its
> not a database server issue. Someone mentioned:
>
> "I actually went and looked at the JDBC api and realized 'addBatch' means to
> run multiple stmts at once, not batch
> inserting.  femski, your best bet is to lobby the JDBC folks to build
> support for 'copy' into the driver for faster bulk loads (or help out in
> that regard). "

This comment is inaccurate.  addBatch is indeed useful for batch inserts. 
Consider the attached test case which inserts a million rows into a single 
column table using prepared statements with and without batches.  I get:

No batch: 148.92
With batch: 48.008

So using batch execution is about three times faster.

Additionaly gains are possible by using multiple threads and multiple 
connections in your client to parallelize the work.  This should provide 
linear scaling to the number of cores/io bandwidth you have available. 
Perhaps oracle is able to do this behind the scenes on the server, but 
postgresql does not.

> Why doesn't the Postgres JDBC driver use "copy" for faster bulk insert ?
> What is the best way to speedup
> do bulk insert at this time or in near future (I was to use standard JDBC
> API) ?
>

Automatically converting from insert to copy is possible in only limited 
circumstances that can be difficult to detect.  For example copy 
cannot handle function calls in data, so you couldn't convert this 
to copy: INSERT INTO t VALUES (now(), ?).  What if t is actually a view 
with an ON INSERT rule?  Copy doesn't work with views.  So some 
complicated parsing and analysis of the query is required.

Sure it's possible, but it's non-trivial and no one has done the work.

Kris Jurka


Attachment: Batch.java
Description: text/plain (1.1 KB)

In response to

pgsql-jdbc by date

Next:From: Kris JurkaDate: 2007-04-20 15:07:23
Subject: Re: Poor addBatch performance. Why dosn't it use copy ?
Previous:From: Ido M. TamirDate: 2007-04-20 14:02:32
Subject: Re: Poor addBatch performance. Why dosn't it use copy ?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group