Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

From: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
To: Damir Belyalov <dam(dot)bel07(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, pavel(dot)stehule(at)gmail(dot)com
Subject: Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Date: 2022-08-22 12:46:56
Message-ID: dc876e926f284ee60eef4ece755fa419@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2022-08-15 22:23, Damir Belyalov wrote:
>> I expected I could see 9999 rows after COPY, but just saw 999 rows.
> Also I implemented your case and it worked correctly.

Thanks for the new patch!

Here are some comments on it.

> + if (safecstate->saved_tuples < REPLAY_BUFFER_SIZE)
> + {
> + valid_row = NextCopyFrom(cstate, econtext, values,
> nulls);
> + if (valid_row)
> + {
> + /* Fill replay_buffer in oldcontext*/
> + MemoryContextSwitchTo(safecstate->oldcontext);
> +
> safecstate->replay_buffer[safecstate->saved_tuples++] =
> heap_form_tuple(RelationGetDescr(cstate->rel), values, nulls);
>
> + /* Buffer was filled, commit subtransaction and prepare
> to replay */
> + ReleaseCurrentSubTransaction();

What is actually being committed by this ReleaseCurrentSubTransaction()?
It seems to me that just safecstate->replay_buffer is fulfilled before
this commit.
As a test, I rewrote this ReleaseCurrentSubTransaction() to
RollbackAndReleaseCurrentSubTransaction() and COPYed over 1000 rows of
data, but same data were loaded.

> +#define REPLAY_BUFFER_SIZE 1000

I feel it might be better to have it as a parameter rather than fixed at
1000.

> +/*
> + * Analog of NextCopyFrom() but ignore rows with errors while copying.
> + */
> +static bool
> +safeNextCopyFrom(CopyFromState cstate, ExprContext *econtext, Datum
> *values, bool *nulls)

NextCopyFrom() is in copyfromparse.c while safeNextCopyFrom() is in
copyfrom.c.
Since safeNextCopyFrom() is analog of NextCopyFrom() as commented, would
it be natural to put them in the same file?

> 188 + safecstate->errors++;
> 189 + if (safecstate->errors <= 100)
> 190 + ereport(WARNING,
> 191 + (errcode(errdata->sqlerrcode),
> 192 + errmsg("%s", errdata->context)));

It would be better to state in the manual that errors exceeding 100 are
not displayed.
Or, it might be acceptable to output all errors that exceed 100.

> +typedef struct SafeCopyFromState
> +{
> +#define REPLAY_BUFFER_SIZE 1000
> + HeapTuple replay_buffer[REPLAY_BUFFER_SIZE]; /* accumulates
> tuples for replaying it after an error */
> + int saved_tuples; /* # of tuples in
> replay_buffer */
> + int replayed_tuples; /* # of tuples was
> replayed from buffer */
> + int errors; /* total # of errors */
> + bool replay_is_active;
> + bool begin_subtransaction;
> + bool processed_remaining_tuples; /* for case of
> replaying last tuples */
> + bool skip_row;

It would be helpful to add comments about skip_row, etc.

```
$ git apply ../patch/0003-COPY_IGNORE_ERRORS.patch

../patch/0003-COPY_IGNORE_ERRORS.patch:86: indent with spaces.
Datum *values, bool *nulls);
warning: 1 line adds whitespace errors.
```

There was a warning when applying the patch.

```
=# copy test from '/tmp/10000.data' with (ignore_errors);
WARNING: FIND 0 ERRORS
COPY 1003
```

When there were no errors, this WARNING seems not necessary.

--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2022-08-22 12:48:56 [PATCH] Tab completion for SET COMPRESSION
Previous Message Aleksander Alekseev 2022-08-22 12:34:25 [PATCH] ALTER TABLE ... SET STORAGE default