Re: Bulkloading using COPY - ignore duplicates?

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Lee Kindness <lkindness(at)csl(dot)co(dot)uk>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bulkloading using COPY - ignore duplicates?
Date: 2001-12-13 18:20:18
Message-ID: Pine.LNX.4.30.0112131714310.647-100000@peter.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Lee Kindness writes:

> 1. Performance enhancements when doing doing bulk inserts - pre or
> post processing the data to remove duplicates is very time
> consuming. Likewise the best tool should always be used for the job at
> and, and for searching/removing things it's a database.

Arguably, a better tool for this is sort(1). For instance, if you have a
typical copy input file with tab-separated fields and the primary key is
in columns 1 and 2, you can remove duplicates with

sort -k 1,2 -u INFILE > OUTFILE

To get a record of what duplicates were removed, use diff.

--
Peter Eisentraut peter_e(at)gmx(dot)net

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Doug McNaught 2001-12-13 18:20:45 Re: Platform testing (last call?)
Previous Message Neil Padgett 2001-12-13 18:01:54 Re: Platform testing (last call?)