Re: Copy From & Insert UNLESS

From: "Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>
To: "Josh Berkus" <josh(at)agliodbs(dot)com>, "James William Pye" <pgsql(at)jwp(dot)name>
Cc: "Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Copy From & Insert UNLESS
Date: 2006-02-06 22:08:38
Message-ID: C00D3316.C788%agoldshuv@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Alon Goldshuv on Bizgres has been working on this as well. Maybe you
> could collaborate? Alon?

I would love to collaborate. The proposal is neat, however, I am not too
excited about handling errors in such high granularity, as far as the user
is concerned. I am more on the same line with Tom Lane's statement in
Simon's thread (Practical error logging for very large COPY statements):

"The general problem that needs to be solved is "trap any error that
occurs during attempted insertion of a COPY row, and instead of aborting
the copy, record the data and the error message someplace else". Seen
in that light, implementing a special path for uniqueness violations is
pretty pointless."

But, I definitely share your struggle to finding a good way to handle those
unique/FK constraints...

Out of your current possible known solutions list:

. Temporary table that filters out the evil tuples.
. BEFORE TRIGGER handling the tuple if the constraint of interest is
violated.
. INSERT wrapped in a subtransaction.
. (Other variations)

I really don't like Temporary tables (too much user intervention) or
subtransactions (sloooow). I also don't like using pg_loader for that
manner, as although it's a nice tool, isolating errors with it for large
data sets is impractical.

I guess the BEFORE TRIGGER is the closest solution to what I would like to
achieve. I think something can be done even without a trigger. We could trap
any of the following:

- bad data (any error before the tuple can be created).
- domain constraints
- check constraints
- NOT NULL constraints

As far as UNIQUE goes, maybe there is a good way to do a bt scan against the
index table right before the simple_heap_insert call? Hopefully without too
much code duplication. I am not too familiar with that code, so I don't have
a very specific idea yet. I don't know how much slower things will become
with this extra scan (I would think it will still be simpler and faster than
a subtransaction), but I figure that there is a price to pay if you want
single row error isolation. Otherwise, if the user wants to run COPY like it
is currently (all data rows or nothing) they could still do it in the same
speed using the current code path, bypassing the extra scan.

Not sure this way very helpful, but these are my thoughts at this moment.

Regards,
Alon.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephan Szabo 2006-02-06 22:17:06 Re: Copy From & Insert UNLESS
Previous Message James William Pye 2006-02-06 21:46:10 Re: Copy From & Insert UNLESS