Re: Improve COPY performance for large data sets

From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Bill Moran <wmoran(at)collaborativefusion(dot)com>
Cc: Ryan Hansen <ryan(dot)hansen(at)brightbuilders(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Improve COPY performance for large data sets
Date: 2008-09-10 20:54:53
Message-ID: 56D9574D-9EB3-410B-9FBA-B1C7329B9E81@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Le 10 sept. 08 à 19:16, Bill Moran a écrit :
> There's a program called pgloader which supposedly is faster than
> copy.
> I've not used it so I can't say definitively how much faster it is.

In fact pgloader is using COPY under the hood, and doing so via a
network connection (could be unix domain socket), whereas COPY on the
server reads the file content directly from the local file. So no,
pgloader is not good for being faster than copy.

That said, pgloader is able to split the workload between as many
threads as you want to, and so could saturate IOs when the disk
subsystem performs well enough for a single CPU not to be able to
overload it. Two parallel loading mode are supported, pgloader will
either hav N parts of the file processed by N threads, or have one
thread read and parse the file then fill up queues for N threads to
send COPY commands to the server.

Now, it could be that using pgloader with a parallel setup performs
better than plain COPY on the server. This remains to get tested, the
use case at hand is said to be for hundreds of GB or some TB data
file. I don't have any facilities to testdrive such a setup...

Note that those pgloader parallel options have been asked by
PostgreSQL hackers in order to testbed some ideas with respect to a
parallel pg_restore, maybe re-explaining what have been implemented
will reopen this can of worms :)

Regards,
- --
dim

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkjINB0ACgkQlBXRlnbh1bmhkgCgu4TduBB0bnscuEsy0CCftpSp
O5IAoMsrPoXAB+SJEr9s5pMCYBgH/CNi
=1c5H
-----END PGP SIGNATURE-----

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Marlowe 2008-09-10 21:06:31 Re: Improve COPY performance for large data sets
Previous Message Greg Smith 2008-09-10 19:44:50 Re: Effects of setting linux block device readahead size