Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

From: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
To: jian he <jian(dot)universality(at)gmail(dot)com>, sawada(dot)mshk(at)gmail(dot)com
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>, Damir Belyalov <dam(dot)bel07(at)gmail(dot)com>, zhihuifan1213(at)163(dot)com, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, anisimow(dot)d(at)gmail(dot)com, HukuToc(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Subject: Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Date: 2024-01-12 02:58:44
Message-ID: 6076bebc06624713efda7421f8ed1ad0@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 10, 2024 at 4:42 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:

> Yeah, I'm still thinking it's better to implement this feature
> incrementally. Given we're closing to feature freeze, I think it's
> unlikely to get the whole feature into PG17 since there are still many
> design discussions we need in addition to what Torikoshi-san pointed
> out. The feature like "ignore errors" or "logging errors" would have
> higher possibilities. Even if we get only these parts of the whole
> "error table" feature into PG17, it will make it much easier to
implement "error tables" feature.

+1.
I'm also going to make patch for "logging errors", since this
functionality is isolated from v7 patch.

> Seems promising. I'll look at the patch.
Thanks a lot!
Sorry to attach v2 if you already reviewed v1..

On 2024-01-11 12:13, jian he wrote:
> On Tue, Jan 9, 2024 at 10:36 PM torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
> wrote:
>>
>> On Tue, Dec 19, 2023 at 10:14 AM Masahiko Sawada
>> <sawada(dot)mshk(at)gmail(dot)com>
>> wrote:
>> > If we want only such a feature we need to implement it together (the
>> > patch could be split, though). But if some parts of the feature are
>> > useful for users as well, I'd recommend implementing it incrementally.
>> > That way, the patches can get small and it would be easy for reviewers
>> > and committers to review/commit them.
>>
>> Jian, how do you think this comment?
>>
>> Looking back at the discussion so far, it seems that not everyone
>> thinks
>> saving table information is the best idea[1] and some people think
>> just
>> skipping error data is useful.[2]
>>
>> Since there are issues to be considered from the design such as
>> physical/logical replication treatment, putting error information to
>> table is likely to take time for consensus building and development.
>>
>> Wouldn't it be better to follow the following advice and develop the
>> functionality incrementally?
>>
>> On Fri, Dec 15, 2023 at 4:49 AM Masahiko Sawada
>> <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> > So I'm thinking we may be able to implement this
>> > feature incrementally. The first step would be something like an
>> > option to ignore all errors or an option to specify the maximum number
>> > of errors to tolerate before raising an ERROR. The second step would
>> > be to support logging destinations such as server logs and tables.
>>
>>
>> Attached a patch for this "first step" with reference to v7 patch,
>> which
>> logged errors and simpler than latest one.
>> - This patch adds new option SAVE_ERROR_TO, but currently only
>> supports
>> 'none', which means just skips error data. It is expected to support
>> 'log' and 'table'.
>> - This patch Skips just soft errors and don't handle other errors such
>> as missing column data.
>
> Hi.
> I made the following change based on your patch
> (v1-0001-Add-new-COPY-option-SAVE_ERROR_TO.patch)
>
> * when specified SAVE_ERROR_TO, move the initialization of
> ErrorSaveContext to the function BeginCopyFrom.
> I think that's the right place to initialize struct CopyFromState
> field.
> * I think your patch when there are N rows have malformed data, then it
> will initialize N ErrorSaveContext.
> In the struct CopyFromStateData, I changed it to ErrorSaveContext
> *escontext.
> So if an error occurred, you can just set the escontext accordingly.
> * doc: mention "If this option is omitted, <command>COPY</command>
> stops operation at the first error."
> * Since we only support 'none' for now, 'none' means we don't want
> ErrorSaveContext metadata,
> so we should set cstate->escontext->details_wanted to false.
>
>> BTW I have question and comment about v15 patch:
>>
>> > + {
>> > + /*
>> > + *
>> > + * InputFunctionCall is more faster than
>> > InputFunctionCallSafe.
>> > + *
>> > + */
>>
>> Have you measured this?
>> When I tested it in an older patch, there were no big difference[3].
> Thanks for pointing it out, I probably was over thinking.
>
>> > - SAVEPOINT SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P
>> SECURITY
>> SELECT
>> > + SAVEPOINT SAVE_ERROR SCALAR SCHEMA SCHEMAS SCROLL SEARCH
>> SECOND_P
>> SECURITY SELECT
>>
>> There was a comment that we shouldn't add new keyword for this[4].
>>
> Thanks for pointing it out.

Thanks for reviewing!

Updated the patch merging your suggestions except below points:

> + cstate->num_errors = 0;

Since cstate is already initialized in below lines, this may be
redundant.

| /* Allocate workspace and zero all fields */
| cstate = (CopyFromStateData *) palloc0(sizeof(CopyFromStateData));

> + Assert(!cstate->escontext->details_wanted);

I'm not sure this is necessary, considering we're going to add other
options like 'table' and 'log', which need details_wanted soon.

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Attachment Content-Type Size
v2-0001-Add-new-COPY-option-SAVE_ERROR_TO.patch text/x-diff 15.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-01-12 03:12:39 Re: Synchronizing slots from primary to standby
Previous Message Kyotaro Horiguchi 2024-01-12 02:56:11 Re: initdb --no-locale=C doesn't work as specified when the environment is not C