Skip site navigation (1) Skip section navigation (2)

Improve COPY performance for large data sets

From: Ryan Hansen <ryan(dot)hansen(at)brightbuilders(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Improve COPY performance for large data sets
Date: 2008-09-10 16:48:41
Message-ID: 48C7FA69.1080507@brightbuilders.com (view raw or flat)
Thread:
Lists: pgsql-performance
Greetings,

I'm relatively new to PostgreSQL but I've been in the IT applications 
industry for a long time, mostly in the LAMP world.

One thing I'm experiencing some trouble with is running a COPY of a 
large file (20+ million records) into a table in a reasonable amount of 
time.  Currently it's taking about 12 hours to complete on a 64 bit 
server with 3 GB memory allocated (shared_buffer), single SATA 320 GB 
drive.  I don't seem to get any improvement running the same operation 
on a dual opteron dual-core, 16 GB server.

I'm not asking for someone to solve my problem, just some direction in 
the best ways to tune for faster bulk loading, since this will be a 
fairly regular operation for our application (assuming it can work this 
way).  I've toyed with the maintenance_work_mem and some of the other 
params, but it's still way slower than it seems like it should be.
So any contributions are much appreciated.

Thanks!

P.S. Assume I've done a ton of reading and research into PG tuning, 
which I have.  I just can't seem to find anything beyond the basics that 
talks about really speeding up bulk loads.

Responses

pgsql-performance by date

Next:From: Ryan HansenDate: 2008-09-10 17:14:23
Subject: Re: Improve COPY performance for large data sets
Previous:From: Scott CareyDate: 2008-09-10 16:26:25
Subject: Re: Effects of setting linux block device readahead size

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group