Re: How long should it take to insert 200,000 records?

From: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
To: "Scott Marlowe" <smarlowe(at)g2switchworks(dot)com>
Cc: "Karen Hill" <karen_hill22(at)yahoo(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: How long should it take to insert 200,000 records?
Date: 2007-02-06 16:40:01
Message-ID: b42b73150702060840y5594ca57mace48fdbdb21fb78@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 2/6/07, Scott Marlowe <smarlowe(at)g2switchworks(dot)com> wrote:
> On Mon, 2007-02-05 at 18:35, Karen Hill wrote:
> > I have a pl/pgsql function that is inserting 200,000 records for
> > testing purposes. What is the expected time frame for this operation
> > on a pc with 1/2 a gig of ram and a 7200 RPM disk? The processor is
> > a 2ghz cpu. So far I've been sitting here for about 2 million ms
> > waiting for it to complete, and I'm not sure how many inserts postgres
> > is doing per second.
>
> That really depends. Doing 200,000 inserts as individual transactions
> will be fairly slow. Since PostgreSQL generally runs in autocommit
> mode, this means that if you didn't expressly begin a transaction, you
> are in fact inserting each row as a transaction. i.e. this:

I think OP is doing insertion inside a pl/pgsql loop...transaction is
implied here. For creating test data, generate_series or
insert...select is obviously the way to go. If that's unsuitable for
some reason, I would suggest RAISE NOTICE every n records so you can
monitor the progress and make sure something is not binding up in a
lock or something like that. Be especially wary of degrading
performance during the process.

Another common problem with poor insert performance is a RI check to
an un-indexed column. In-transaction insert performance should be
between 1k and 10k records/second in normal situations, meaning if you
haven't inserted 1 million records inside of an hour something else is
going on.

Generally, insertion performance from fastest to slowest is:
* insert select generate_series...
* insert select
* copy
* insert (),(),()[...] (at least 10 or preferably 100 insertions)
* begin, prepare, n prepared inserts executed, commit
* begin, n inserts, commit
* plpgsql loop, single inserts
* n inserts outside of transaction.

The order of which is faster might not be absolutely set in stone
(copy might beat insert select for example), but the top 4 methods
will always be much faster than the bottom 4.

merlin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Marlowe 2007-02-06 16:55:53 Re: How long should it take to insert 200,000 records?
Previous Message Scott Marlowe 2007-02-06 16:14:44 Re: How long should it take to insert 200,000 records?