Re: Bulk Insert and Index use

From: "Gregory S(dot) Williamson" <gsw(at)globexplorer(dot)com>
To: <tech(at)wildcash(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Bulk Insert and Index use
Date: 2004-08-11 03:53:11
Message-ID: 71E37EF6B7DCC1499CEA0316A256832801057DE5@loki.wc.globexplorer.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

If it has to read a majority (or even a good percentage) of the rows in question a sequential scan is probably faster ... and as Jim pointed out, a temp table can often be a useful medium for getting speed in a load and then allowing you to clean/alter data for a final (easy) push.

G
-----Original Message-----
From: Rudi Starcevic [mailto:tech(at)wildcash(dot)com]
Sent: Tue 8/10/2004 8:33 PM
To: pgsql-performance(at)postgresql(dot)org
Cc:
Subject: Re: [PERFORM] Bulk Insert and Index use
Hi Jim,

Thanks for your time.

> If the bulk load has the possibility of duplicating data

Yes, each row will require either:

a) One SELECT + One INSERT
or
b) One SELECT + One UPDATE

I did think of using more than one table, ie. temp table.
As each month worth of data is added I expect to see
a change from lots of INSERTS to lots of UPDATES.

Perhaps when the UPDATES become more dominant it would
be best to start using Indexes.

While INSERTS are more prevelant perhaps a seq. scan is better.

I guess of all the options available it boils down to which
is quicker for my data: index or sequential scan.

Many thanks.

Jim J wrote:

> If the bulk load has the possibility of duplicating data, then you need
> to change methods. Try bulk loading into a temp table, index it like
> the original, eliminate the dups and merge the tables.
>
> It is also possible to do an insert from the temp table into the final
> table like:
> insert into original (x,x,x) (select temp.1, temp.2, etc from temp left
> join original on temp.street=original.street where original.street is null)
>
> Good Luck
> Jim
>
> Rudi Starcevic wrote:
>
>> Hi,
>>
>> I have a question on bulk checking, inserting into a table and
>> how best to use an index for performance.
>>
>> The data I have to work with is a monthly CD Rom csv data dump of
>> 300,000 property owners from one area/shire.
>>
>> So every CD has 300,000 odd lines, each line of data which fills the
>> 'property' table.
>>
>> Beginning with the first CD each line should require one SELECT and
>> one INSERT as it will be the first property with this address.
>>
>> The SELECT uses fields like 'street' and 'suburb', to check for an
>> existing property,
>> so I have built an index on those fields.
>>
>> My question is does each INSERT rebuild the index on the 'street' and
>> 'suburb' fields?
>> I believe it does but I'm asking to be sure.
>>
>> If this is the case I guess performance will suffer when I have, say,
>> 200,000
>> rows in the table.
>>
>> Would it be like:
>>
>> a) Use index to search on 'street' and 'suburb'
>> b) No result? Insert new record
>> c) Rebuild index on 'street' and 'suburb'
>>
>> for each row?
>> Would this mean that after 200,000 rows each INSERT will require
>> the index of 000's of rows to be re-indexed?
>>
>> So far I believe my only options are to use either and index
>> or sequential scan and see which is faster.
>>
>> A minute for your thoughts and/or suggestions would be great.
>>
>> Thanks.
>> Regards,
>> Rudi.
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 4: Don't 'kill -9' the postmaster
>>
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html
>
>

--

Regards,
Rudi.

Internet Media Productions

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Browse pgsql-performance by date

  From Date Subject
Next Message Rudi Starcevic 2004-08-11 04:10:52 Re: Bulk Insert and Index use
Previous Message Rudi Starcevic 2004-08-11 03:33:33 Re: Bulk Insert and Index use