Improving PostgreSQL insert performance

From: Frits Jalvingh <jal(at)etc(dot)to>
To: pgsql-performance(at)postgresql(dot)org
Subject: Improving PostgreSQL insert performance
Date: 2017-06-09 13:04:29
Message-ID: CAKhTGFX-ChBSjqENrAv8uqUR_H5PTvf14jG0cOAq6c-ami_7sQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi all,

I am trying to improve the runtime of a big data warehouse application. One
significant bottleneck found was insert performance, so I am investigating
ways of getting Postgresql to insert data faster. I ran several tests on a
fast machine to find out what performs best, and compared the results with
the same actions in Oracle on that same machine.

So far I am finding that PostgreSQL insert performance is several times
slower than Oracle performance, and I would be grateful for some help
trying to decrease this gap...

To test I wrote a simple Java program which inserts data into a simple
table, using statement batching and delayed commits. The table looks as
follows:

create table h_test(
id_h integer
, source_system_id integer
, organisation_id integer
, load_dts timestamp without time zone
, boekingdetailid text
);
No constraints, no indexes.

The java program and PostgreSQL run on the same machine. The best results
I've got are:

PostgreSQL inserts:

Commit size 50.000 and batch size 10.000
Inserted 1000000 rows in 7500 milliseconds, 142857.14285714287 rows per
second
Inserted 1000000 rows in 7410 milliseconds, 142857.14285714287 rows per
second

The exact same test done on Oracle (on the same machine) reports:

Inserted 1000000 rows in 1072 milliseconds, 1000000.0 rows per second

Increasing the row count in Oracle decreases this number a bit, but it's
still fast:
Inserted 24000000 rows in 47155 milliseconds, 510638.2978723404 rows per
second (oracle)

compared with:
Inserted 24000000 rows in 159929 milliseconds, 150943.3962264151 rows per
second (postgresql)

I also created a small pg/sql stored procedure to insert the same 1 million
rows, which runs in about 4 seconds, resulting in 250.000 rows a second.
This is in the DB itself, but it still is twice as slow as Oracle with JDBC:
CREATE or replace function test() returns void AS $$
DECLARE
count integer;
BEGIN
for count in 1..1000000 loop
insert into
h_test(id_h,source_system_id,organisation_id,load_dts,boekingdetailid)
values(count, 1, 12, now(), 'boe' || count || 'king' || count);
end loop;
END;
$$ LANGUAGE plpgsql;

I already changed the following config parameters:
work_mem 512MB
synchronous_commit off
shared_buffers 512mb
commit_delay 100000
autovacuum_naptime 10min

Postgres version is 9.6.3 on Ubuntu 17.04 64 bit, on a i7-4790K with 16GB
memory and an Intel 750 SSD. JDBC driver is postgresql-42.1.1.

(btw: the actual load I'm trying to improve will load more than 132 million
rows, and will grow).

Any help is greatly appreciated!

Regards,

Frits

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin.Hughes@uk.fujitsu.com 2017-06-09 13:20:46 Re: Client Server performance & UDS
Previous Message Merlin Moncure 2017-06-08 19:57:38 Re: index of only not null, use function index?