Re: What's the fastest way to do this?

From: "Colin 't Hart" <cthart(at)yahoo(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: What's the fastest way to do this?
Date: 2001-11-09 08:39:10
Message-ID: 9sg4mn$1r5e$1@news.tht.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Einar Karttunen writes:

> > I have several really big tables that have columns uniquely identified
by
> > single or multiple rows. [ I have about 25 tables, 10k to 500k rows
> > per table ]
> >
> > Each day I get a flat file of updates. I have no way of knowing which
> > lines in the file are new records and which are updates for existing
> > records.
> >
> > I need a way to insert the new ones and update the old ones. I have
> > a couple of ideas but none of them seem fast enough ( I will soon
> > be getting updates faster than I can feed them into the database ).
>
> I was facing a similar problem some time ago. My solution was to create
> a temp table and COPY the new data to it. After that I deleted all records
> in the original table which existed in the temporary table. Then I just
> did a insert from a select * from the temp table. Of course with this
> approach you have to lock the tables.
>
> - Einar Karttunen

Even faster is to:

1. load the data into a temporary table (even faster is to load into a
permanent
table -- just truncate it first in each run). Let's call this table
ImportTable.

2. update the existing records
UPDATE rt
SET a = t.a, b = x.b, c = x.c
FROM RealTable AS rt, ImportTable AS x
WHERE rt.pk = x.pk

3. insert the new records
INSERT INTO RealTable(pk, a, b, c)
SELECT pk, a, b, c
FROM ImportTable AS x
WHERE NOT EXISTS (SELECT 1 FROM RealTable WHERE pk = x.pk)

'pk' is the primary key of the table, or, at worst, a UNIQUEly INDEXed
column
combination.

This avoids the expensive DELETE operation (DBMSs are generally better at
INSERTing and UPDATEing than DELETEing, and remember that the DELETEs have
to
go through the transaction log).

Don't use cursors if you can help it -- cursors can be up to several orders
of
magnitude slower, and usually at least 4 times slower.

Using an import table allows you to sanitize the data by insert a step to do
this between steps 1 and 2.

Cheers,

Colin

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message darwin 2001-11-09 09:22:10 problem connecting client to a postgresSQL server
Previous Message Stuart Robinson 2001-11-09 07:51:33 inserts on views using rules