Re: Small fix on COPY ON_ERROR document

From: Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Small fix on COPY ON_ERROR document
Date: 2024-01-26 09:09:52
Message-ID: 20240126180952.b80658a541700779b533f59a@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 25 Jan 2024 23:33:22 -0700
"David G. Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> wrote:

> On Thu, Jan 25, 2024 at 10:40 PM Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp> wrote:
>
> > On Fri, 26 Jan 2024 13:59:09 +0900
> > Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > > On Fri, Jan 26, 2024 at 11:28 AM Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
> > wrote:
> > > >
> > > > Hi,
> > > >
> > > > I found that the documentation of COPY ON_ERROR said
> > > > COPY stops operation at the first error when
> > > > "ON_ERROR is not specified.", but it also stop when
> > > > ON_ERROR is specified to the default value, "stop".
> > > >
> >
> >
> I'm finding the current wording surrounding ON_ERROR to not fit in with the
> rest of the page. I've tried to improve things by being a bit more
> invasive.

Introducing the ON_ERROR option changed the behavior of COPY about error
handling to some extent, so it might be good to rewrite a bit more parts
of the documentation as you suggest.

> This first hunk updates the description of the COPY command to include
> describing the purpose of the ON_ERROR clause.
>
> I too am concerned about the word "parsed" here, and "malformed" in the
> more detailed descriptions; this would need to be modified to better
> reflect the circumstances under which ignore happens.

I think "errors due to data type incompatibility" would be nice to describe
the errors to be ignored by ON_ERROR, as used in the NOTICE message output.

> diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
> index 21a5c4a052..5fe4c9f747 100644
> --- a/doc/src/sgml/ref/copy.sgml
> +++ b/doc/src/sgml/ref/copy.sgml
> @@ -90,6 +90,14 @@ COPY { <replaceable
> class="parameter">table_name</replaceable> [ ( <replaceable
> in the <structname>pg_stat_progress_copy</structname> view. See
> <xref linkend="copy-progress-reporting"/> for details.
> </para>
> +
> + <para>
> + By default, <command>COPY</command> will abort if it encounters errors.
> + For non-binary format <command>COPY FROM</command> the user can specify
> + to instead ignore input rows that cannot be parsed by specifying
> + the <literal>ignore</literal> option to the <literal>ON_ERROR</literal>
> + clause.
> + </para>
> </refsect1>
>

>
> The following was, IMO, too much text about something not to worry about,
> COPY TO and ignore mode. The point is dead tuples on error and running
> vacuum. It doesn't really even matter what caused the error, though if the
> error was even before rows started to be imported then obviously there
> would be no dead tuples.
>
> Oh, and the word "first" really isn't adding anything here that we cannot
> reasonably leave to common sense, IMO. We either ignore all errors or stop
> on the first one. There isn't a stop mode that is capable of bypassing
> errors and the ignore mode doesn't have a "first n" option so one assumes
> all errors are ignored.
>
> @@ -576,15 +583,12 @@ COPY <replaceable
> class="parameter">count</replaceable>
> </para>
>
> <para>
> - <command>COPY</command> stops operation at the first error when
> - <literal>ON_ERROR</literal> is not specified. This
> - should not lead to problems in the event of a <command>COPY
> - TO</command>, but the target table will already have received
> - earlier rows in a <command>COPY FROM</command>. These rows will not
> - be visible or accessible, but they still occupy disk space. This might
> - amount to a considerable amount of wasted disk space if the failure
> - happened well into a large copy operation. You might wish to invoke
> - <command>VACUUM</command> to recover the wasted space.
> + A failed <command>COPY FROM</command> command will have performed
> + physical insertion of all rows prior to the malformed one.

As you said that it does not matter what caused the error, I don't think
we have to use "malformed" here. Instead, we could say "we will have
performed physical insertion of rows before the failure."

> + While these rows will not be visible or accessible, they still occupy
> + disk space. This might amount to a considerable amount of wasted disk
> + space if the failure happened well into a large copy operation.
> + <command>VACUUM</command> should be used to recover the wasted space.
> </para>
>
> <para>
>
> David J.

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-01-26 09:22:41 Re: Fwd: BUG #18016: REINDEX TABLE failure
Previous Message Junwang Zhao 2024-01-26 09:02:23 Re: Make COPY format extendable: Extract COPY TO format implementations