Skip site navigation (1) Skip section navigation (2)

Re: Best COPY Performance

From: "Worky Workerson" <worky(dot)workerson(at)gmail(dot)com>
To: "Markus Schaber" <schabi(at)logix-tt(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Best COPY Performance
Date: 2006-10-23 15:40:57
Message-ID: ce4072df0610230840r4b3df653h3144dab06bdb9ed2@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-performance
Markus,

> Could you COPY one of your tables out to disk via psql, and then COPY it
> back into the database, to reproduce this measurement with your real data?

$ psql -c "COPY my_table TO STDOUT" > my_data
$ ls my_data
2018792 edgescape_pg_load
$ time cat my_data | psql -c "COPY mytable FROM STDIN"
real    5m43.194s
user    0m35.412s
sys     0m9.567s

> Also, how much is the disk load, and CPU usage?

  When I am loading via the perl (which I've established is a
bottleneck), the one CPU core is at 99% for the perl and another is at
30% for a postmaster, vs about 90% for the postmaster when going
through psql.

The disk load is where I start to get a little fuzzy, as I haven't
played with iostat to figure what is "normal".  The local drives
contain PG_DATA as well as all the log files, but there is a
tablespace on the FibreChannel SAN that contains the destination
table.  The disk usage pattern that I see is that there is a ton of
consistent activity on the local disk, with iostat reporting an
average of 30K Blk_wrtn/s, which I assume is the log files.  Every
several seconds there is a massive burst of activity on the FC
partition, to the tune of 250K Blk_wrtn/s.

> On a table with no indices, triggers and contstraints, we managed to
> COPY about 7-8 megabytes/second with psql over our 100 MBit network, so
> here the network was the bottleneck.

hmm, this makes me think that either my PG config is really lacking,
or that the SAN is badly misconfigured, as I would expect it to
outperform a 100Mb network.  As it is, with a straight pipe to psql
COPY, I'm only working with a little over 5.5 MB/s.  Could this be due
to the primary key index updates?

Thanks!

In response to

Responses

pgsql-performance by date

Next:From: Tom LaneDate: 2006-10-23 16:01:41
Subject: Re: Index on two columns not used
Previous:From: Markus SchaberDate: 2006-10-23 15:17:06
Subject: Re: Index on two columns not used

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group