Re: Bulkloading using COPY - ignore duplicates?

From: Lee Kindness <lkindness(at)csl(dot)co(dot)uk>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Lee Kindness <lkindness(at)csl(dot)co(dot)uk>, Jim Buttafuoco <jim(at)buttafuoco(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bulkloading using COPY - ignore duplicates?
Date: 2001-12-18 10:09:14
Message-ID: 15391.5578.336203.295826@elsick.csl.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Eisentraut writes:
> Lee Kindness writes:
> > Consider SELECT DISTINCT - which is the 'duplicate' and which one is
> > the good one?
> It's not the same thing. SELECT DISTINCT only eliminates rows that are
> completely the same, not only equal in their unique contraints.
> Maybe you're thinking of SELECT DISTINCT ON (). Observe the big warning
> that the result of that statement are random unless ORDER BY is used. --
> But that's not the same thing either. We've never claimed that the COPY
> input has an ordering assumption. In fact you're asking for a bit more
> than an ordering assumption, you're saying that the earlier data is better
> than the later data. I think in a random use case that is more likely
> *not* to be the case because the data at the end is newer.

You're right - I was meaning 'SELECT DISTINCT ON ()'. However I'm only
using it as an example of where the database is choosing (be it
randomly) the data to discarded. While I've said in this thread that
'COPY FROM IGNORE DUPLICATES' would ignore later duplicates I'm not
really that concerned about what it ignores; first, later, random,
... I agree if it was of concern then it should be pre-processed.

> Btw., here's another concern about this proposed feature: If I do
> a client-side COPY, how will you sent the "ignored" rows back to
> the client?

Again a number of different ideas have been mixed up in the
discussion. Oracle's logging option was only given as an example of
how other database systems deal with this option - If it wasn't
explicitly given then it's reasonable to discard the extra
information.

What really would be nice in the SQL-world is a standardised COPY
statement...

Best regards, Lee Kindness.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jayaraj Oorath 2001-12-18 10:38:15 Scheduling Jobs in Postgres
Previous Message Christoph Haller 2001-12-18 09:05:58 Re: ODBC on OSX