Re: Bulk INSERT with individual failure

From: Darren Duncan <darren(at)darrenduncan(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Bulk INSERT with individual failure
Date: 2013-01-14 06:26:23
Message-ID: 50F3A50F.9070502@darrenduncan.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 2013.01.13 5:58 PM, Robert James wrote:
> I need to INSERT a large number of records. For performance reasons,
> I'd rather send them to Postgres in one giant INSERT.
>
> However, if there's a problem in one record (eg one row doesn't meet a
> constraint), I'd still like the others saved. That is, I specifically
> DO NOT want atomic behavior. It's okay to silently drop bad data in
> this case - I don't even need to know about it.
>
> Is there any way to do this, or am I forced to but each record into
> its own INSERT?

Here's the best way:

1. Create a temporary staging table and bulk-insert all your data into it. This
table would resemble the actual destination and has slots to hold all the data,
but it would have weaker constraints, eg no unique/pk or foreign keys, such that
your raw data is guaranteed to be accepted.

2. Use all the nice data analysis tools that SQL gives you and perform an
INSERT...SELECT... into the actual destination from the staging table, and have
any filters or tests or cleanups or joins with other tables (such as the
destination table) that you desire so to preemptively take care of anything that
would have caused a constraint failure.

Modify to taste.

In fact, this is my generally recommended method for doing any kind of bulk data
import, because its much easier to clean data using SQL than otherwise, and its
all very efficient resource-wise.

-- Darren Duncan

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Chris Angelico 2013-01-14 06:57:23 Re: Linux Distribution Preferences?
Previous Message Chris Angelico 2013-01-14 04:44:01 Re: INSERT... WHERE