Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>, Damir Belyalov <dam(dot)bel07(at)gmail(dot)com>, zhihuifan1213(at)163(dot)com, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, anisimow(dot)d(at)gmail(dot)com, HukuToc(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Subject: Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Date: 2023-12-18 07:41:19
Message-ID: CACJufxEPOPfMbTPsuj91_1ra2hXJMHw_jMFnGiLBrCcE5Q_Bng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 18, 2023 at 1:09 PM torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> wrote:
>
> Hi,
>
> > save the error metadata to system catalogs would be more expensive,
> > please see below explanation.
> > I have no knowledge of publications.
> > but i feel there is a feature request: publication FOR ALL TABLES
> > exclude regex_pattern.
> > Anyway, that would be another topic.
>
> I think saving error metadata to system catalog is not a good idea, too.
> And I believe Sawada-san just pointed out missing features and did not
> suggested that we use system catalog.
>
> > I don't think "specify the maximum number of errors to tolerate
> > before raising an ERROR." is very useful....
>
> That may be so.
> I imagine it's useful in some use case since some loading tools have
> such options.
> Anyway I agree it's not necessary for initial patch as mentioned in [1].
>
> > I suppose we can specify an ERRORFILE directory. similar
> > implementation [2], demo in [3]
> > it will generate 2 files, one file shows the malform line content as
> > is, another file shows the error info.
>
> That may be a good option when considering "(2) logging errors to
> somewhere".
>
> What do you think about the proposal to develop these features in
> incrementally?
>

I am more with tom's idea [1], that is when errors happen (data type
conversion only), do not fail, AND we save the error to a table. I
guess we can implement this logic together, only with a new COPY
option.

imagine a case (it's not that contrived, imho), while conversion from
text to table's int, postgres isspace is different from the source
text file's isspace logic.
then all the lines are malformed. If we just say on error continue and
not save error meta info, the user is still confused which field has
the wrong data, then the user will probably try to incrementally test
which field contains malformed data.

Since we need to save the error somewhere.
Everyone has the privilege to INSERT can do COPY.
I think we also need to handle the access privilege also.
So like I mentioned above, one copy_error error table hub, then
everyone can view/select their own copy failure record.

but save to a server text file/directory, not easy for an INSERT
privilege user to see these files, I think.
similarly not easy to see these failed records in log for limited privilege.

if someone wants to fail at maxerror rows, they can do it, since we
will count how many rows failed.
even though I didn't get it.

[1] https://www.postgresql.org/message-id/900123.1699488001%40sss.pgh.pa.us

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrei Lepikhov 2023-12-18 08:32:08 Re: introduce dynamic shared memory registry
Previous Message Hayato Kuroda (Fujitsu) 2023-12-18 07:40:43 RE: [PoC] pg_upgrade: allow to upgrade publisher node