Best COPY Performance

From: "Worky Workerson" <worky(dot)workerson(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Best COPY Performance
Date: 2006-10-20 20:05:33
Message-ID: ce4072df0610201305u7836dc7bn437ec0aaf1c3e25f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

What is the best COPY performance that you have gotten on a "normal" table?

I know that this is question is almost too general, but it might help
me out a bit, or at least give me the right things to tweak. Perhaps
the question can be rewritten as "Where are the major bottlenecks in a
COPY?" or "How can I compute the max theoretical COPY performance for
my hardware?". The two subquestions that I have from this are:
-Are my ETL scripts (perl) maximizing the database COPY speeds?
-Can I tweak my DB further to eek out a bit more performance?

I'm using perl to ETL a decent sized data set (10 million records) and
then loading it through perl::DBI's copy. I am currently getting
between 10K and 15K inserts/second. I've profiled the ETL scripts a
bit and have performance-improved a lot of the code, but I'd like to
determine whether it makes sense to try and further optimize my Perl
or count it as "done" and look for improvements elsewhere.

I ran trivial little insert into a table with a single integer row and
came close to 250K inserts/second using psql's \copy, so I'm thinking
that my code could be optimized a bit more, but wanted to check around
to see if that was the case.

I am most interested in loading two tables, one with about 21 (small)
VARCHARs where each record is about 200 bytes, and another with 7
INTEGERs, 3 TIMESTAMPs, and 1 BYTEA where each record is about 350
bytes.

I have implemented most of the various bits of PG config advice that I
have seen, both here and with a some googling, such as:

wal_buffers=128
checkpoint_segments=128
checkpoint_timeout=3000

Software: PG 8.1.3 on RHEL 4.3 x86_64
Hardware: Quad Dual-core Opteron, Fibre Channel SAN with 256M BBC

Thanks!

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message alvis 2006-10-20 20:26:02 Re: New hardware thoughts
Previous Message Joshua D. Drake 2006-10-20 18:52:09 Re: New hardware thoughts