Re: Using a COPY...FROM through JDBC?

From: Markus Schaber <schabios(at)logi-track(dot)com>
To:
Cc: Postgres-JDBC <pgsql-jdbc(at)postgresql(dot)org>, Postgres-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Using a COPY...FROM through JDBC?
Date: 2004-06-07 07:47:58
Message-ID: 20040607094758.5eb0bf71@kingfisher.intern.logi-track.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc pgsql-performance

Hi, Steve,

Here are the results of some benchmarks we did inserting 30k rows into a
table, using "time psql -f blubb.sql -q dbname":

File kingfisher skate
30kinsert.sql 39.359s 762r/s 335.024s 90r/s
30kcommonce.sql 11.402s 2631r/s 7.086s 4233r/s
30kwithids.sql 10.138s 2959r/s 6.936s 4325r/s
30kprepare.sql 8.173s 3670r/s 5.189s 5781r/s
30kdump.sql 1.286s 23328r/s 0.785s 38216r/s
30kdumpseq.sql 1.498s 20026r/s 0.927s 32362r/s

Kingfisher is the single processor machine I mentioned yesterday night,
skate the SMP machine.

The table has five rows (bigint, bigint, double, double, timestamp
without time zone). The first of them has a "default nextval('sequence'
::text)" definition, and there are no further constraints or indices.

The 30kinsert.sql uses simple insert statements with autocommit on, and
we insert all but the first column which is filled by the default
sequence. With this test, kingfisher seems to have an irrealistic high
value of commits (syncs) per second (see what I wrote yesterday) [1],
skate has a more realistic value.

30kcommonce.sql, as suspected, gives a rather high boost by
encapsulating all into a single commit statement.

30kwithids gives a small boost by inserting pre-calculated sequence
numbers, so it seems not worth the effort to move this logic into the
application.

30kprepare prepares the insert statement, and then issues 30k EXECUTE
statements within one transaction, the speedup is noticeable.

30kdump simply inserts the 30k rows as a dump via COPY FROM STDIN. (as
with 30kwithids, the first column is contained in the insert data, so
the default value sequence is not used). Naturally, this is by far the
fastest method.

30kdumpseq.sql uses COPY, too, but omits the first column and such
utilizes the sequence generation again. This gives a noticeable 15%
slowdown, but seems to be still fast enough for our purposes. Sadly, it
is not available within jdbc.

Thanks for your patience.

Footnotes:
[1] We suspect this to be some strange interaction between ide,
cryptoloop and ext3fs, so that the sync() call somehow does not really
wait for the data to be physically written to the disk. (I really can't
imagine a crypto-looped notebook harddisk to do more syncs/second than a
SCSI-Based RAID in a server machine. We did some small benches on the
sync() / fsync() calls that seem to prove this conclusion.)

--
markus schaber | dipl. informatiker
logi-track ag | rennweg 14-16 | ch 8001 zürich
phone +41-43-888 62 52 | fax +41-43-888 62 53
mailto:schabios(at)logi-track(dot)com | www.logi-track.com

In response to

Browse pgsql-jdbc by date

  From Date Subject
Next Message Kris Jurka 2004-06-07 07:50:21 Re: Forcing use of cursor for large result sets
Previous Message Kris Jurka 2004-06-07 07:47:57 Re: Column quoting in result set getters

Browse pgsql-performance by date

  From Date Subject
Next Message Kris Jurka 2004-06-07 09:26:38 Re: Using a COPY...FROM through JDBC?
Previous Message SZŰCS Gábor 2004-06-07 07:36:19 Relation of cpu_*_costs?