inserting large number of rows was: Re: Increasing number of PG connections.

From: "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
To: Qing Zhao <qzhao(at)quotefx(dot)net>
Cc: Kevin Barnard <kbarnard(at)speedfc(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: inserting large number of rows was: Re: Increasing number of PG connections.
Date: 2004-02-02 20:49:53
Message-ID: Pine.LNX.4.33.0402021346040.20085-100000@css120.ihs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, 2 Feb 2004, Qing Zhao wrote:

> I am new here. I have a question related to this in some way.
>
> Our web site needs to upload a large volume of data into Postgres at a
> time. The performance deterioates as number of rows becomes larger.
> When it reaches 2500 rows, it never come back to GUI. Since the tests
> were run through GUI, my suspision is
> that it might be caused by the way the application server talking to
> Postgres server, the connections, etc.. What might be the factors
> involved here? Does anyone know?

Actually, I'm gonna go out on a limb here and assume two things:

1. you've got lotsa fk/pk relationships setup.
2. you're analyzing the table empty before loading it up.

What happens in this instance is that the analyze on an empty, or nearly
so, table, means that during the inserts, postgresql thinks you have only
a few rows. At first, this is fine, as pgsql will seq scan the
tables to make sure there is a proper key in both. As the number of
rows increases, the planner needs to switch to index scans but doesn't,
because it doesn't know that the number of rows is increasing.

Fix: insert a few hundred rows, run analyze, check to see if the explain
for inserts is showing index scans or not. If not, load a few more
hundred rows, analyze, rinse, repeat.

Also, look for fk/pk mismatches. I.e. an int4 field pointing to an int8
field. That's a performance killer, so if the pk/fk types don't match,
see if you can change your field types to match and try again.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Richard Huxton 2004-02-02 20:52:19 Bulk Record upload (was Re: Increasing number of PG connections)
Previous Message Robert Treat 2004-02-02 20:14:08 Re: MySQL+InnoDB vs. PostgreSQL test?