Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>
Cc: Damir Belyalov <dam(dot)bel07(at)gmail(dot)com>, zhihuifan1213(at)163(dot)com, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Gustafsson <daniel(at)yesql(dot)se>, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, anisimow(dot)d(at)gmail(dot)com, HukuToc(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Subject: Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Date: 2023-12-06 10:47:10
Message-ID: CACJufxH3KGOGhnS-J2bYnA-mpABg4HG9Xddfj9zBBgoK+iGSsQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 5, 2023 at 6:07 PM Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru> wrote:
>
> Hi!
>
> Thank you for your contribution to this thread.
>
>
> I reviewed it and have a few questions.
>
> 1. I have seen that you delete a table before creating it, to which you want to add errors due to a failed "copy from" operation. I think this is wrong because this table can save useful data for the user.
> At a minimum, we should warn the user about this, but I think we can just add some number at the end of the name, such as name_table1, name_table_2.

Sorry. I don't understand this part.
Currently, if the error table name already exists, then the copy will
fail, an error will be reported.
I try to first create a table, if no error then the error table will be dropped.
Can you demo the expected behavior?

> 2. I noticed that you are forming a table name using the type of errors that prevent rows from being added during 'copy from' operation.
> I think it would be better to use the name of the source file that was used while 'copy from' was running.
> In addition, there may be several such files, it is also worth considering.
>

Another column added.
now it looks like:

SELECT * FROM save_error_csv_error;
filename | lineno | line
| field | source | err_message |
err_detail | errorcode
----------+--------+----------------------------------------------------+-------+--------+---------------------------------------------+------------+-----------
STDIN | 1 | 2002 232 40 50 60 70
80 | NULL | NULL | extra data after last expected column |
NULL | 22P04
STDIN | 1 | 2000 230 23
| d | NULL | missing data for column "d" | NULL
| 22P04
STDIN | 1 | z,,""
| a | z | invalid input syntax for type integer: "z" | NULL
| 22P02
STDIN | 2 | \0,,
| a | \0 | invalid input syntax for type integer: "\0" | NULL
| 22P02

> 3. I found spelling:
>
> /* no err_nsp.error_rel table then crete one. for holding error. */
>

fixed.

> 4. Maybe rewrite this comment
>
> these info need, no error will drop err_nsp.error_rel table
> to:
> this information is necessary, no error will lead to the deletion of the err_sp.error_rel table.
>

fixed.

> 5. Is this part of the comment needed? I think it duplicates the information below when we form the query.
>
> * . column list(order by attnum, begin from ctid) =
> * {ctid, lineno,line,field,source,err_message,err_detail,errorcode}
> * . data types (from attnum = -1) ={tid, int8,text,text,text,text,text,text}
>
> I'm not sure if we need to order the rows by number. It might be easier to work with these lines in the order they appear.
>
Simplified the comment. "order by attnum" is to make sure that if
there is a table already existing, and the column name is like X and
the data type like Y, then we consider this table is good for holding
potential error info.

COPY FROM, main entry point is NextCopyFrom.
Now for non-binary mode, if you specified save_error then it will not
fail at NextCopyFrom.
all these three errors will be tolerated: extra data after last
expected column, missing data for column, data type conversion.

Attachment Content-Type Size
v9-0001-Make-COPY-FROM-more-error-tolerant.patch text/x-patch 42.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2023-12-06 10:58:51 Re: Synchronizing slots from primary to standby
Previous Message Alvaro Herrera 2023-12-06 10:42:29 Re: remaining sql/json patches