Re: NOLOGGING option, or ?

From: "Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>
To: "Alvaro Herrera" <alvherre(at)surnet(dot)cl>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: NOLOGGING option, or ?
Date: 2005-06-02 16:54:56
Message-ID: BEC487F0.51BA%agoldshuv@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>> 2) A modified command syntax for introducing a direct single row error
>> handling. By direct I mean - a row that if rejected from within the COPY
>> command context does not throw an error and rollsback the whole transaction.
>> Instead the error is caught and recorded elsewhere, maybe in some error
>> table, with some more information that can later on be retrieved. The
>> following rows continue to be processed. This way there is barely any error
>> handling overhead.
>
> Is there any idea on exactly how would this be done? Do you plan on
> using savepoints to implement it? I fail to see how is this "barely any
> overhead". Savepoints are not that expensive but they are not free either.
> (No, I haven't measured it.)

Good question, I am not entirely sure if this is possible yet, as I didn't
think it through entirely yet.

I guess data errors could be divided into 2 main categories: mal-formed data
where error is detected even before forming a tuple, and the other is errors
that are caused by some constraint violation, that is, after the tuple is
formed and inserted.

From what I hear and experience the big majority of errors are of the first
type. In that case the error could be caught, the data line + line number +
error description could be inserted into an ERROR table (all TEXT fields),
and then COPY can skip forming a tuple, and move to parsing the next. In
this process there is barely any overhead.

The more difficult part obviously is handling the second error type, which I
haven't looked at yet deeply. Hopefully it is not impossible to do while
keeping transaction integrity (Any ideas anyone?). The overhead for this one
will probably be larger, but again, we expect those to happen less (in most
cases at least). Nevertheless, it is surely much faster than recursively
narrowing down batch sizes.

Alon.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2005-06-02 18:15:16 Re: NOLOGGING option, or ?
Previous Message vamsi krishna 2005-06-02 15:43:26 need help