Skip site navigation (1) Skip section navigation (2)

Re: Bulkloading using COPY - ignore duplicates?

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Lee Kindness <lkindness(at)csl(dot)co(dot)uk>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Jim Buttafuoco <jim(at)buttafuoco(dot)net>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bulkloading using COPY - ignore duplicates?
Date: 2002-01-02 21:09:36
Message-ID: 200201022109.g02L9aW27520@candle.pha.pa.us (view raw or flat)
Thread:
Lists: pgsql-hackers
Lee Kindness wrote:
> Tom Lane writes:
>  > Lee Kindness <lkindness(at)csl(dot)co(dot)uk> writes:
>  > > In an ideal world 'COPY FROM' would only be used with data output by
>  > > 'COPY TO' and it would be nice and sanitised. However in some fields
>  > > this often is not a possibility due to performance constraints!
>  > Of course, the more bells and whistles we add to COPY, the slower it
>  > will get, which rather defeats the purpose no?
> 
> Indeed, but as I've mentioned in this thread in the past, the code
> path for COPY FROM already does a check against the unique index (if
> there is one) but bombs-out rather than handling it...
> 
> It wouldn't add any execution time if there were no duplicates in the
> input!

I know many purists object to allowing COPY to discard invalid rows in
COPY input, but it seems we have lots of requests for this feature, with
few workarounds except pre-processing the flat file.  Of course, if they
use INSERT, they will get errors that they can just ignore.  I don't see
how allowing errors in COPY is any more illegal, except that COPY is one
command while multiple INSERTs are separate commands.

Seems we need to allow such a capability, if only crudely.  I don't
think we can create a discard file because of the problem with remote
COPY.

I think we can allow something like:

	COPY FROM '/tmp/x' WITH ERRORS 2

meaning we will allow at most two errors and will report the error line
numbers to the user.  I think this syntax clearly indicates that errors
are being accepted in the input.  An alternate syntax would allow an
unlimited number of errors:

	COPY FROM '/tmp/x' WITH ERRORS

The errors can be non-unique errors, or even CHECK constraint errors.

Unless I hear complaints, I will add it to TODO.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman(at)candle(dot)pha(dot)pa(dot)us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

In response to

Responses

pgsql-hackers by date

Next:From: Laurette CisnerosDate: 2002-01-02 21:40:32
Subject: bug in join?
Previous:From: Hannu KrosingDate: 2002-01-02 21:09:14
Subject: Re: problems with new vacuum (??)

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group