Re: VLDB Features

From: Trent Shipley <trent_shipley(at)qwest(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: VLDB Features
Date: 2007-12-15 01:30:16
Message-ID: 200712141830.16877.trent_shipley@qwest.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday 2007-12-14 16:22, Tom Lane wrote:
> Neil Conway <neilc(at)samurai(dot)com> writes:
> > By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY
> > to drop (and log) rows that contain malformed data. That is, rows with
> > too many or too few columns, rows that result in constraint violations,
> > and rows containing columns where the data type's input function raises
> > an error. The last case is the only thing that would be a bit tricky to
> > implement, I think: you could use PG_TRY() around the InputFunctionCall,
> > but I guess you'd need a subtransaction to ensure that you reset your
> > state correctly after catching an error.
>
> Yeah. It's the subtransaction per row that's daunting --- not only the
> cycles spent for that, but the ensuing limitation to 4G rows imported
> per COPY.

You could extend the COPY FROM syntax with a COMMIT EVERY n clause. This
would help with the 4G subtransaction limit. The cost to the ETL process is
that a simple rollback would not be guaranteed send the process back to it's
initial state. There are easy ways to deal with the rollback issue though.

A {NO} RETRY {USING algorithm} clause might be useful. If the NO RETRY
option is selected then the COPY FROM can run without subtransactions and in
excess of the 4G per transaction limit. NO RETRY should be the default since
it preserves the legacy behavior of COPY FROM.

You could have an EXCEPTIONS TO {filename|STDERR} clause. I would not give the
option of sending exceptions to a table since they are presumably malformed,
otherwise they would not be exceptions. (Users should re-process exception
files if they want an if good then table a else exception to table b ...)

EXCEPTIONS TO and NO RETRY would be mutually exclusive.

> If we could somehow only do a subtransaction per failure, things would
> be much better, but I don't see how.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2007-12-15 04:34:57 Re: VLDB Features
Previous Message Tom Lane 2007-12-15 01:03:56 Re: VLDB Features