Re: Good way to insert/update when you're not sure of duplicates?

From: Curtis Stanford <curtis(at)stanfordcomputing(dot)com>
To: Richard Huxton <dev(at)archonet(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Good way to insert/update when you're not sure of duplicates?
Date: 2003-09-25 17:32:11
Message-ID: 200309251132.11906.curtis@stanfordcomputing.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On September 25, 2003 11:27 am, Richard Huxton wrote:
> On Thursday 25 September 2003 02:23, Curtis Stanford wrote:
> > I'm in a situation where I need to load an ASCII file into a database. No
> > sweat so far right? The records are indexed by date and I have a unique
> > index on date. The ASCII can overlap, meaning it can contain duplicate
> > dates that have been loaded before.
> >
> > I started out with this ingenious idea. Just try to insert the record. If
> > I get an error about duplicate keys, do an update and switch to update
> > mode. If I'm updating and get zero rows updated, switch back to insert.
> > Works fine, except this whole thing is in a transaction and any errors
> > abort the transaction and I can't commit it. So, I decided I'd have to
> > try to update every single record and, if zero rows were updated, do the
> > insert. This takes a looooong time. Many hours for just 86000 records or
> > so.
> >
> > Is there a standard way to do this. I can't imagine I'm the only guy that
> > need to do this kind of thing.
>
> Try inserting a batch of 1024. If you get an error, drop down to 512 and
> try that. Repeat until the batch works or you've reached a size of 1.
> If the batch worked, try the next set of records and repeat. If you reached
> a batch size of 1 with no success then, switch to updating, and repeat the
> cycle increasing your batch-size as you go.
>
> You might find it quickest to halve batch-size while having problems then
> doubling while it works. The balance is going to depend on how many insert
> vs update rows you have.

Hey thanks! I actually got the time down to around 1000 rows/sec. I was
passing in one of the int arguments in the update where clause as a double. I
don't know why, but changing it to an int as it should be drastically reduced
the time. Your solution is very interesting. I'll probably try it and see
which is faster.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Vivek Khera 2003-09-25 18:37:37 Re: PostgreSQL at OSCON 2004
Previous Message Richard Huxton 2003-09-25 17:27:18 Re: Good way to insert/update when you're not sure of duplicates?