Skip site navigation (1) Skip section navigation (2)

Re: Insert performance vs Table size

From: Jacques Caron <jc(at)directinfos(dot)com>
To: "Praveen Raja" <praveen(dot)raja(at)netlight(dot)se>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Insert performance vs Table size
Date: 2005-06-28 10:43:47
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-performance

At 11:50 28/06/2005, Praveen Raja wrote:
>I assume you took size to mean the row size?

Nope, the size of the table.

>  What I really meant was
>does the number of rows a table has affect the performance of new
>inserts into the table (just INSERTs) all other things remaining
>constant. Sorry for the confusion.

As I said previously, in most cases it does. One of the few cases where it 
doesn't would be an append-only table, no holes, no indexes, no foreign keys...

>I know that having indexes on the table adds an overhead but again does
>this overhead increase (for an INSERT operation) with the number of rows
>the table contains?

It depends on what you are indexing. If the index key is something that 
grows monotonically (e.g. a unique ID or a timestamp), then the size of the 
table (and hence of the indexes) should have a very limited influence on 
the INSERTs. If the index key is anything else (and that must definitely be 
the case if you have 7 or 8 indexes!), then that means updates will happen 
all over the indexes, which means a lot of read and write activity, and 
once the total size of your indexes exceeds what can be cached in RAM, 
performance will decrease quite a bit. Of course if your keys are 
concentrated in a few limited areas of the key ranges it might help.

>My instinct says no to both. If I'm wrong can someone explain why the
>number of rows in a table affects INSERT performance?

As described above, maintaining indexes when you "hit" anywhere in said 
indexes is very costly. The larger the table, the larger the indexes, the 
higher the number of levels in the trees, etc. As long as it fits in RAM, 
it shouldn't be a problem. Once you exceed that threshold, you start 
getting a lot of random I/O, and that's expensive.

Again, it depends a lot on your exact schema, the nature of the data, the 
spread of the different values, etc, but I would believe it's more often 
the case than not.


In response to

pgsql-performance by date

Next:From: Andrew DunstanDate: 2005-06-28 12:53:50
Subject: Re: COPY FROM performance improvements
Previous:From: Praveen RajaDate: 2005-06-28 09:50:08
Subject: Re: Insert performance vs Table size

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group