Quick Links

Bulkloading using COPY - ignore duplicates?

From:	Lee Kindness <lkindness(at)csl(dot)co(dot)uk>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	lkindness(at)csl(dot)co(dot)uk
Subject:	Bulkloading using COPY - ignore duplicates?
Date:	2001-10-01 11:04:41
Message-ID:	15288.19913.741567.344714@elsick.csl.co.uk
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello,

I'm in the process of porting a large application from Ingres to
PostgreSQL. We make heavy use of bulkloading using the 'COPY'
statement in ESQL/C. Consider the SQL statements below (in a psql
session on an arbitrary database):

CREATE TABLE copytest(f1 INTEGER, f2 INTEGER);
CREATE UNIQUE INDEX copytest_idx ON copytest USING BTREE(f1, f2);
COPY copytest FROM '/tmp/copytest';

Given the file /tmp/copytest:

1 1
2 2
3 3
4 4
4 4
5 5
6 6

will result in the following output:

ERROR: copy: line 5, Cannot insert a duplicate key into unique index copytest_idx

However my application code is assuming that duplicate rows will
simply be ignored (this is the case in Ingres, and I believe Oracle's
bulkloader too). I propose modifying _bt_check_unique() in
/backend/access/nbtree/nbtinsert.c to emit a NOTICE (rather than
ERROR) elog() and return NULL (or appropriate) to the calling function
if a duplicate key is detected and a 'COPY FROM' is in progress (add
new parameter to flag this).

Would this seem a reasonable thing to do? Does anyone rely on COPY
FROM causing an ERROR on duplicate input? Would:

WITH ON_DUPLICATE = CONTINUE|TERMINATE (or similar)

need to be added to the COPY command (I hope not)?

Thanks,

--
Lee Kindness, Senior Software Engineer
Concept Systems Limited.

Responses

Re: Bulkloading using COPY - ignore duplicates? at 2001-10-01 11:40:41 from Justin Clift
Re: Bulkloading using COPY - ignore duplicates? at 2001-10-01 13:36:36 from Tom Lane
Re: Bulkloading using COPY - ignore duplicates? at 2001-10-01 16:03:51 from Bruce Momjian
Bulkloading using COPY - ignore duplicates? at 2001-10-08 12:41:27 from Lee Kindness

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Haller Christoph	2001-10-01 11:24:11	What executes faster?
Previous Message	Justin Clift	2001-10-01 10:57:52	Re: Glitch in handling of postmaster -o options