Quick Links

COPY Performance

From:	"Hans Zaunere" <lists(at)zaunere(dot)com>
To:	<pgsql-general(at)postgresql(dot)org>
Subject:	COPY Performance
Date:	2008-05-04 23:11:35
Message-ID:	00db01c8ae3c$32c76be0$985643a0$@com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Hello,

We're using a statement like this to dump between 500K and >5 million rows.

COPY(SELECT SomeID FROM SomeTable WHERE SomeColumn > '0')
TO '/dev/shm/SomeFile.csv'

Upon first run, this operation can take several minutes. Upon second run,
it will be complete in generally well under a minute.

Some observations/thoughts:

-- When watching the file grow, it will "trickle" upon first run, increasing
in size by only a few kb/s. Upon second run, it will grow by megabytes per
second.

-- If we dump to disk, the same behavior is observed - no apparent RAM
contention issues.

-- The SELECT by itself will complete typically in a few seconds, either on
second or first run.

-- Upon the first run, the PostgreSQL process typically consumes <10% CPU.

-- In very rare cases, we've seen even the first run be as fast as the
second run.

-- There is no other activity on the server while performing these tests.

-- SomeID is just an int

-- CPU usage (wait %) is low; seems as though there is some other bottleneck

Any thoughts on:

-- tuning parameters to speed up the initial execution

-- why is it so slow to begin with?

-- possible areas to check for bottlenecks?

-- better method for accomplishing the same thing?

Thank you,

Responses

Re: COPY Performance at 2008-05-04 23:41:01 from Tom Lane
Re: COPY Performance at 2008-05-05 00:11:15 from Scott Marlowe
Re: COPY Performance at 2008-05-05 15:03:15 from Scott Marlowe

Browse pgsql-general by date

	From	Date	Subject
Next Message	Tom Lane	2008-05-04 23:41:01	Re: COPY Performance
Previous Message	Greg Smith	2008-05-04 23:03:29	Re: Interesting comments about fsync on Linux