Re: Practical error logging for very large COPY

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Practical error logging for very large COPY
Date: 2005-11-22 08:07:08
Message-ID: 1132646828.4959.507.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2005-11-21 at 19:38 -0500, Andrew Dunstan wrote:
>
> Tom Lane wrote:
>
> >Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> >
> >
> >>What I'd like to do is add an ERRORTABLE clause to COPY. The main
> >>problem is how we detect a duplicate row violation, yet prevent it from
> >>aborting the transaction.
> >>
> >If this only solves the problem of duplicate keys, and not any other
> >kind of COPY error, it's not going to be much of an advance.
> >

> Yeah, and I see errors from bad data as often as from violating
> constraints. Maybe the best way if we do something like this would be to
> have the error table contain a single text, or maybe bytea, field which
> contained the raw offending input line.

I have committed the sin of omission again.

Duplicate row violation is the big challenge, but not the only function
planned. Formatting errors occur much more frequently, so yes we'd want
to log all of that too. And yes, it would be done in the way you
suggest.

Here's a fuller, but still brief sketch:

COPY ... FROM ....
[ERRORTABLES format1 [uniqueness1]
[ERRORLIMIT percent]]

where Format1, Uniqueness1 would be created from new by this command (or
error if they already exist)

Format1 would hold formatting errors so would be in a blob table with
cols (line number, col number, error number, fullrowstring)

Uniqueness1 would be same definition as table, but with no indexes
This table would be optional, indicating no uniqueness violation checks
would be needed to be carried out. If present and yet no unique indexes
exist, then Uniqueness1 would be ignored (and not created).

ERRORLIMIT percent would abort the COPY if more than percent errors were
found, after the first 1000 records (that limit could also be stated if
required).

Without the ERRORTABLES clause, COPY would work exactly as it does now.

How does that sound?

Best Regards, Simon Riggs

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dennis Bjorklund 2005-11-22 08:22:18 Web page down (ad server)
Previous Message Jaime Casanova 2005-11-22 07:07:48 Re: MERGE vs REPLACE