Re: Bulkloading using COPY - ignore duplicates?

From: Daniel Kalchev <daniel(at)digsys(dot)bg>
To: "Vadim Mikheev" <vmikheev(at)sectorbase(dot)com>
Cc: "PostgreSQL Development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bulkloading using COPY - ignore duplicates?
Date: 2002-01-04 08:07:21
Message-ID: 200201040807.KAA29714@dcave.digsys.bg
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>>"Vadim Mikheev" said:
> 1. I prefer Oracle' (and others, I believe) way - put statement(s) in PL
> block and define
> for what exceptions (errors) what actions should be taken (ie IGNORE for
> NON_UNIQ_KEY
> error, etc).

Some people prefer 'pure' SQL. Anyway, it can be argued which is worse - the
usage of non-SQL language, or usage of extended SQL language. I guess the SQL
standard does not provide for such functionality?

> 2. For INSERT ... SELECT statement one can put DISTINCT in select' target
> list.

With this construct, you are effectively copying rows from one table to
another - or constructing rows from various sources (constants, other tables
etc) and inserting these in the table. If the target table has unique indexes
(or constraints), and some of the rows returned by SELECT violate the
restrictions - you are supposed to get errors - and unfortunately the entire
INSERT is aborted. I fail to see how DISTINCT can help here... Perhaps it is
possible to include checking for already existing tuples in the destination
table in the select... but this will significantly increase the runtime,
especially when the destination table is huge.

My idea is to let this INSERT statement insert as much of its rows as
possible, eventually returning NOTICEs or ignoring the errors (with an IGNORE
ERRORS syntax for example :)

I believe all this functionality will have to consider the syntax firts.

Daniel

Browse pgsql-hackers by date

  From Date Subject
Next Message Karel Zak 2002-01-04 10:23:08 Re: datetime error?
Previous Message Oleg Bartunov 2002-01-04 07:48:04 Re: RC1 time?