Re: Improve COPY performance for large data sets

From: Bill Moran <wmoran(at)collaborativefusion(dot)com>
To: Ryan Hansen <ryan(dot)hansen(at)brightbuilders(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Improve COPY performance for large data sets
Date: 2008-09-10 17:16:06
Message-ID: 20080910131606.cc18bed5.wmoran@collaborativefusion.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

In response to Ryan Hansen <ryan(dot)hansen(at)brightbuilders(dot)com>:
>
> I'm relatively new to PostgreSQL but I've been in the IT applications
> industry for a long time, mostly in the LAMP world.
>
> One thing I'm experiencing some trouble with is running a COPY of a
> large file (20+ million records) into a table in a reasonable amount of
> time. Currently it's taking about 12 hours to complete on a 64 bit
> server with 3 GB memory allocated (shared_buffer), single SATA 320 GB
> drive. I don't seem to get any improvement running the same operation
> on a dual opteron dual-core, 16 GB server.
>
> I'm not asking for someone to solve my problem, just some direction in
> the best ways to tune for faster bulk loading, since this will be a
> fairly regular operation for our application (assuming it can work this
> way). I've toyed with the maintenance_work_mem and some of the other
> params, but it's still way slower than it seems like it should be.
> So any contributions are much appreciated.

There's a program called pgloader which supposedly is faster than copy.
I've not used it so I can't say definitively how much faster it is.

A single 320G drive isn't going to get you much on speed. How many
RPM? Watch iostat on your platform to see if you're saturating the
drive, if you are, the only way you're going to get it faster is to
add more disks in a RAID-10 or similar, or somehow get a faster disk.

You always have the option to turn off fsync, but be sure you understand
the consequences of doing that and have an appropriate failure plan
before doing so.

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran(at)collaborativefusion(dot)com
Phone: 412-422-3463x4023

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Dimitri Fontaine 2008-09-10 17:17:37 Re: Improve COPY performance for large data sets
Previous Message Alan Hodgson 2008-09-10 17:14:34 Re: Improve COPY performance for large data sets