Skip site navigation (1) Skip section navigation (2)

Re: Improve COPY performance for large data sets

From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Bill Moran <wmoran(at)collaborativefusion(dot)com>
Cc: Ryan Hansen <ryan(dot)hansen(at)brightbuilders(dot)com>,pgsql-performance(at)postgresql(dot)org
Subject: Re: Improve COPY performance for large data sets
Date: 2008-09-10 20:54:53
Message-ID: 56D9574D-9EB3-410B-9FBA-B1C7329B9E81@hi-media.com (view raw or flat)
Thread:
Lists: pgsql-performance
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Le 10 sept. 08 à 19:16, Bill Moran a écrit :
> There's a program called pgloader which supposedly is faster than  
> copy.
> I've not used it so I can't say definitively how much faster it is.

In fact pgloader is using COPY under the hood, and doing so via a  
network connection (could be unix domain socket), whereas COPY on the  
server reads the file content directly from the local file. So no,  
pgloader is not good for being faster than copy.

That said, pgloader is able to split the workload between as many  
threads as you want to, and so could saturate IOs when the disk  
subsystem performs well enough for a single CPU not to be able to  
overload it. Two parallel loading mode are supported, pgloader will  
either hav N parts of the file processed by N threads, or have one  
thread read and parse the file then fill up queues for N threads to  
send COPY commands to the server.

Now, it could be that using pgloader with a parallel setup performs  
better than plain COPY on the server. This remains to get tested, the  
use case at hand is said to be for hundreds of GB or some TB data  
file. I don't have any facilities to testdrive such a setup...

Note that those pgloader parallel options have been asked by  
PostgreSQL hackers in order to testbed some ideas with respect to a  
parallel pg_restore, maybe re-explaining what have been implemented  
will reopen this can of worms :)

Regards,
- --
dim

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkjINB0ACgkQlBXRlnbh1bmhkgCgu4TduBB0bnscuEsy0CCftpSp
O5IAoMsrPoXAB+SJEr9s5pMCYBgH/CNi
=1c5H
-----END PGP SIGNATURE-----

In response to

pgsql-performance by date

Next:From: Scott MarloweDate: 2008-09-10 21:06:31
Subject: Re: Improve COPY performance for large data sets
Previous:From: Greg SmithDate: 2008-09-10 19:44:50
Subject: Re: Effects of setting linux block device readahead size

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group