Re: Bulkloading using COPY - ignore duplicates?

From: Lee Kindness <lkindness(at)csl(dot)co(dot)uk>
To: Patrick Welche <prlw1(at)newn(dot)cam(dot)ac(dot)uk>
Cc: Lee Kindness <lkindness(at)csl(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bulkloading using COPY - ignore duplicates?
Date: 2001-12-13 15:44:31
Message-ID: 15384.52447.737930.610518@elsick.csl.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Patrick Welche writes:
> On Thu, Dec 13, 2001 at 01:25:11PM +0000, Lee Kindness wrote:
> > That's what I'm currently doing as a workaround - a SELECT DISTINCT
> > from a temporary table into the real table with the unique index on
> > it. However this takes absolute ages - say 5 seconds for the copy
> > (which is the ballpark figure I aiming toward and can achieve with
> > Ingres) plus another 30ish seconds for the SELECT DISTINCT.
> Then your column really isn't unique,

That's another discussion entirely ;) - it's spat out by a real-time
system which doesn't have the time or resources to check this. Further
precision loss later in the data's life adds more duplicates...

> so how about dropping the unique index, import the data, fix the
> duplicates, recreate the unique index - just as another possible
> work around ;)

This is just going to be the same(ish) time, no?

CREATE TABLE tab (p1 INT, p2 INT, other1 INT, other2 INT);
COPY tab FROM 'file';
DELETE FROM tab WHERE p1, p2 NOT IN (SELECT DISTINCT p1, p2
FROM tab);
CREATE UNIQUE INDEX tab_idx ON tab USING BTREE(p1, p2);

or am I missing something?

Thanks, Lee.

--
Lee Kindness, Senior Software Engineer, Concept Systems Limited.
http://services.csl.co.uk/ http://www.csl.co.uk/ +44 131 5575595

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zeugswetter Andreas SB SD 2001-12-13 15:47:10 Re: [HACKERS] Platform Testing - Cygwin
Previous Message Patrick Welche 2001-12-13 15:29:57 Re: Bulkloading using COPY - ignore duplicates?