Re: Poor addBatch performance. Why dosn't it use copy ?

From: Kris Jurka <books(at)ejurka(dot)com>
To: femski <hypertree(at)yahoo(dot)com>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: Poor addBatch performance. Why dosn't it use copy ?
Date: 2007-04-20 15:05:23
Message-ID: Pine.BSO.4.64.0704201053200.26355@leary.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

On Fri, 20 Apr 2007, femski wrote:

> I took this discussion in Postgres performance list and came out with
> conclusion that its a
> client side JDBC issue - so I am psting it here.
>
> I have a batch application that writes approx. 4 million rows into a narrow
> (2 column) table. I am using JDBC addBatch/ExecuteBatch with auto commit
> turned off. Batch size is 1000. So far I am seeing Postgres take
> roughly five times (280 sec) the time it takes to do this in the Oracle
> (60). This is on a Linux
> server with Xeon woodcrest 5310 process. Plenty of memory. I have played
> with many parameters on
> the server side and they seem to have little effect - I am sure Postgres is
> a very capable server and its
> not a database server issue. Someone mentioned:
>
> "I actually went and looked at the JDBC api and realized 'addBatch' means to
> run multiple stmts at once, not batch
> inserting. femski, your best bet is to lobby the JDBC folks to build
> support for 'copy' into the driver for faster bulk loads (or help out in
> that regard). "

This comment is inaccurate. addBatch is indeed useful for batch inserts.
Consider the attached test case which inserts a million rows into a single
column table using prepared statements with and without batches. I get:

No batch: 148.92
With batch: 48.008

So using batch execution is about three times faster.

Additionaly gains are possible by using multiple threads and multiple
connections in your client to parallelize the work. This should provide
linear scaling to the number of cores/io bandwidth you have available.
Perhaps oracle is able to do this behind the scenes on the server, but
postgresql does not.

> Why doesn't the Postgres JDBC driver use "copy" for faster bulk insert ?
> What is the best way to speedup
> do bulk insert at this time or in near future (I was to use standard JDBC
> API) ?
>

Automatically converting from insert to copy is possible in only limited
circumstances that can be difficult to detect. For example copy
cannot handle function calls in data, so you couldn't convert this
to copy: INSERT INTO t VALUES (now(), ?). What if t is actually a view
with an ON INSERT rule? Copy doesn't work with views. So some
complicated parsing and analysis of the query is required.

Sure it's possible, but it's non-trivial and no one has done the work.

Kris Jurka

Attachment Content-Type Size
Batch.java text/plain 1.1 KB

In response to

Browse pgsql-jdbc by date

  From Date Subject
Next Message Kris Jurka 2007-04-20 15:07:23 Re: Poor addBatch performance. Why dosn't it use copy ?
Previous Message Ido M. Tamir 2007-04-20 14:02:32 Re: Poor addBatch performance. Why dosn't it use copy ?