Skip site navigation (1) Skip section navigation (2)

Re: Bulkloading using COPY - ignore duplicates?

From: Lee Kindness <lkindness(at)csl(dot)co(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Lee Kindness <lkindness(at)csl(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bulkloading using COPY - ignore duplicates?
Date: 2001-10-01 13:54:25
Message-ID: 15288.30097.74590.206271@elsick.csl.co.uk (view raw or flat)
Thread:
Lists: pgsql-hackers
Tom Lane writes:
 > Lee Kindness <lkindness(at)csl(dot)co(dot)uk> writes:
 > > Would this seem a reasonable thing to do? Does anyone rely on COPY
 > > FROM causing an ERROR on duplicate input?
 > Yes.  This change will not be acceptable unless it's made an optional
 > (and not default, IMHO, though perhaps that's negotiable) feature of
 > COPY.

I see where you're coming from, but seriously what's the use/point of
COPY aborting and doing a rollback if one duplicate key is found? I
think it's quite reasonable to presume the input to COPY has had as
little processing done on it as possible. I could loop through the
input file before sending it to COPY but that's just wasting cycles
and effort - Postgres has btree lookup built in, I don't want to roll
my own before giving Postgres my input file!

 > The implementation might be rather messy too.  I don't much care
 > for the notion of a routine as low-level as bt_check_unique knowing
 > that the context is or is not COPY.  We might have to do some
 > restructuring.

Well in reality it wouldn't be "you're getting run from copy" but
rather "notice on duplicate, rather than error & exit". There is a
telling comment in nbtinsert.c just before _bt_check_unique() is
called:

	/*
	 * If we're not allowing duplicates, make sure the key isn't already
	 * in the index.  XXX this belongs somewhere else, likely
	 */

So perhaps dupes should be searched for before _bt_doinsert is called,
or somewhere more appropriate?

 > > Would:
 > >  WITH ON_DUPLICATE = CONTINUE|TERMINATE (or similar)
 > > need to be added to the COPY command (I hope not)?
 > It occurs to me that skip-the-insert might be a useful option for
 > INSERTs that detect a unique-key conflict, not only for COPY.  (Cf.
 > the regular discussions we see on whether to do INSERT first or
 > UPDATE first when the key might already exist.)  Maybe a SET variable
 > that applies to all forms of insertion would be appropriate.

That makes quite a bit of sense.

-- 
 Lee Kindness, Senior Software Engineer
 Concept Systems Limited.

In response to

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2001-10-01 14:02:54
Subject: Re: Bulkloading using COPY - ignore duplicates?
Previous:From: Justin CliftDate: 2001-10-01 13:48:19
Subject: Re: Spinlock performance improvement proposal

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group