Re: Add new error_action COPY ON_ERROR "log"

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, jian(dot)universality(at)gmail(dot)com, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add new error_action COPY ON_ERROR "log"
Date: 2024-02-12 06:46:41
Message-ID: CALj2ACUk700cYhx1ATRQyRw-fBM+aRo6auRAitKGff7XNmYfqQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 29, 2024 at 8:41 AM torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> wrote:
>
> On 2024-01-27 00:43, David G. Johnston wrote:
>
> >> I'd like to have a new option "log", which skips soft errors and
> >> logs
> >> information that should have resulted in errors to PostgreSQL log.
> >
> > user-specified tables in the same database. Setting up temporary
> > tables or unlogged tables probably is going to be a more acceptable
> > methodology than trying to get to the log files.
>
> OTOH not everyone thinks saving table information is the best idea[2].

The added NOTICE message gives a summary of how many rows were
skipped, but not the line numbers. It's hard for the users to find the
malformed data, especially when they are bulk-inserting from data
files of multiple GBs in size (hard to open such files in any editor
too).

I understand the value of writing the info to server log or tables of
users' choice as being discussed in a couple of other threads.
However, I'd prefer we do the simplest thing first before we go debate
server log or table - let the users know what line numbers are
containing the errors that COPY ignored something like [1] with a
simple change like [2]. It not only helps users figure out which rows
and attributes were malformed, but also helps them redirect them to
server logs with setting log_min_messages = notice [3]. In the worst
case scenario, a problem with this one NOTICE per malformed row is
that it can overload the psql session if all the rows are malformed.
I'm not sure if this is a big problem, but IMO better than a single
summary NOTICE message and simpler than writing to tables of users'
choice.

Thoughts?

FWIW, I presented the new COPY ... ON_ERROR option feature at
Hyderabad PostgreSQL User Group meetup and it was well-received by the
audience. The audience felt it's going to be extremely helpful for
bulk-loading tasks. Thank you all for working on this feature.

[1]
postgres=# CREATE TABLE check_ign_err (n int, m int[], k int);
CREATE TABLE
postgres=# COPY check_ign_err FROM STDIN WITH (on_error ignore);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1 {1} 1
a {2} 2
3 {3} 3333333333
4 {a, 4} 4

5 {5}>> >> >> >> >> 5
\.>>
NOTICE: detected data type incompatibility at line number 2 for
column check_ign_err, COPY n
NOTICE: detected data type incompatibility at line number 3 for
column check_ign_err, COPY k
NOTICE: detected data type incompatibility at line number 4 for
column check_ign_err, COPY m
NOTICE: detected data type incompatibility at line number 5 for
column check_ign_err, COPY n
NOTICE: 4 rows were skipped due to data type incompatibility
COPY 2

[2]
diff --git a/src/backend/commands/copyfromparse.c
b/src/backend/commands/copyfromparse.c
index 906756362e..93ab5d4ebb 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -961,7 +961,16 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,

&values[m]))
{
cstate->num_errors++;
- return true;
+
+ if (cstate->opts.on_error != COPY_ON_ERROR_STOP)
+ {
+ ereport(NOTICE,
+
errmsg("detected data type incompatibility at line number %llu for
column %s, COPY %s",
+
(unsigned long long) cstate->cur_lineno,
+
cstate->cur_relname,
+
cstate->cur_attname));
+ return true;
+ }
}

cstate->cur_attname = NULL;

[3]
2024-02-12 06:20:29.363 UTC [427892] NOTICE: detected data type
incompatibility at line number 2 for column check_ign_err, COPY n
2024-02-12 06:20:29.363 UTC [427892] CONTEXT: COPY check_ign_err,
line 2, column n: "a"
2024-02-12 06:20:29.363 UTC [427892] NOTICE: detected data type
incompatibility at line number 3 for column check_ign_err, COPY k
2024-02-12 06:20:29.363 UTC [427892] CONTEXT: COPY check_ign_err,
line 3, column k: "3333333333"
2024-02-12 06:20:29.363 UTC [427892] NOTICE: detected data type
incompatibility at line number 4 for column check_ign_err, COPY m
2024-02-12 06:20:29.363 UTC [427892] CONTEXT: COPY check_ign_err,
line 4, column m: "{a, 4}"
2024-02-12 06:20:29.363 UTC [427892] NOTICE: detected data type
incompatibility at line number 5 for column check_ign_err, COPY n
2024-02-12 06:20:29.363 UTC [427892] CONTEXT: COPY check_ign_err,
line 5, column n: ""
2024-02-12 06:20:29.363 UTC [427892] NOTICE: 4 rows were skipped due
to data type incompatibility

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2024-02-12 07:08:00 Re: RFC: Logging plan of the running query
Previous Message Andrei Lepikhov 2024-02-12 05:52:13 Re: POC, WIP: OR-clause support for indexes