Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

From: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>, dam(dot)bel07(at)gmail(dot)com
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Danil Anisimow <anisimow(dot)d(at)gmail(dot)com>, HukuToc(at)gmail(dot)com, a(dot)lepikhov(at)postgrespro(dot)ru, tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject: Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Date: 2023-03-07 02:07:07
Message-ID: 8ad8492ff9fae3481d87c7aab4e0aed0@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2023-03-06 23:03, Daniel Gustafsson wrote:
>> On 28 Feb 2023, at 15:28, Damir Belyalov <dam(dot)bel07(at)gmail(dot)com> wrote:
>
>> Tested patch on all cases: CIM_SINGLE, CIM_MULTI, CIM_MULTI_CONDITION.
>> As expected it works.
>> Also added a description to copy.sgml and made a review on patch.
Thanks for your tests and improvements!

>> I added 'ignored_errors' integer parameter that should be output after
>> the option is finished.
>> All errors were added to the system logfile with full detailed
>> context. Maybe it's better to log only error message.
Certainly.

> FWIW, Greenplum has a similar construct (but which also logs the errors
> in the
> db) where data type errors are skipped as long as the number of errors
> don't
> exceed a reject limit. If the reject limit is reached then the COPY
> fails:
>
> LOG ERRORS [ SEGMENT REJECT LIMIT <count> [ ROWS | PERCENT ]]
>
> IIRC the gist of this was to catch then the user copies the wrong input
> data or
> plain has a broken file. Rather than finding out after copying n rows
> which
> are likely to be garbage the process can be restarted.
>
> This version of the patch has a compiler error in the error message:
>
> copyfrom.c: In function ‘CopyFrom’:
> copyfrom.c:1008:29: error: format ‘%ld’ expects argument of type ‘long
> int’, but argument 2 has type ‘uint64’ {aka ‘long long unsigned int’}
> [-Werror=format=]
> 1008 | ereport(WARNING, errmsg("Errors: %ld", cstate->ignored_errors));
> | ^~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~
> | |
> | uint64 {aka long
> long unsigned int}
>
>
> On that note though, it seems to me that this error message leaves a
> bit to be
> desired with regards to the level of detail.
+1.
I felt just logging "Error: %ld" would make people wonder the meaning of
the %ld. Logging something like ""Error: %ld data type errors were
found" might be clearer.

--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2023-03-07 02:09:56 Re: POC: Lock updated tuples in tuple_update() and tuple_delete()
Previous Message Katsuragi Yuta 2023-03-07 01:58:21 Re: [Proposal] Add foreign-server health checks infrastructure