Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

From: Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>
To: jian he <jian(dot)universality(at)gmail(dot)com>
Cc: Damir Belyalov <dam(dot)bel07(at)gmail(dot)com>, zhihuifan1213(at)163(dot)com, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Gustafsson <daniel(at)yesql(dot)se>, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, anisimow(dot)d(at)gmail(dot)com, HukuToc(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Subject: Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Date: 2023-12-05 10:07:24
Message-ID: c7c80c6d-4c02-4629-8a27-5c635d27ad10@yandex.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

Thank you for your contribution to this thread.

On 04.12.2023 05:23, jian he wrote:
> hi.
> here is my implementation based on previous discussions
>
> add a new COPY FROM flag save_error.
> save_error only works with non-BINARY flags.
> save_error is easier for me to implement, if using "save error" I
> worry, 2 words, gram.y will not work.
> save_error also works other flag like {csv mode, force_null, force_not_null}
>
> overall logic is:
> if save_error is specified then
> if error_holding table not exists then create one
> if error_holding table exists set error_firsttime to false.
> if save_error is not specified then work as master branch.
>
> if errors happen then insert error info to error_holding table.
> if errors do not exist and error_firsttime is true then drop the table.
> if errors do not exist and error_firsttime is false then raise a
> notice: All the past error holding saved at %s.%s
>
> error holding table:
> schema will be the same as COPY destination table.
> the table name will be: COPY destination name concatenate with "_error".
>
> error_holding table definition:
> CREATE TABLE err_nsp.error_rel (LINENO BIGINT, LINE TEXT,
> FIELD TEXT, SOURCE TEXT, ERR_MESSAGE TEXT,
> ERR_DETAIL TEXT, ERRORCODE TEXT);
>
> the following field is not implemented.
> FIELDS text[], separated, de-escaped string fields (the data that was
> or would be fed to input functions)
>
> because imagine following case:
> create type test as (a int, b text);
> create table copy_comp (c1 int, c2 test default '(11,test)', c3 date);
> copy copy_comp from stdin with (default '\D');
> 1 \D '2022-07-04'
> \.
> table copy_comp;
>
> I feel it's hard from textual '\D' to get text[] `(11,test)` via SPI.
> --------------------------------------
> demo:
>
> create table copy_default_error_save (
> id integer,
> text_value text not null default 'test',
> ts_value timestamp without time zone not null default '2022-07-05'
> );
> copy copy_default_error_save from stdin with (save_error, default '\D');
> k value '2022-07-04'
> z \D '2022-07-03ASKL'
> s \D \D
> \.
>
> NOTICE: 3 rows were skipped because of error. skipped row saved to
> table public.copy_default_error_save_error
> select * from copy_default_error_save_error;
> lineno | line | field | source
> | err_message |
> err_detail | errorcode
> --------+----------------------------------+----------+------------------+-------------------------------------------------------------+------------+-----------
> 1 | k value '2022-07-04' | id | k
> | invalid input syntax for type integer: "k" |
> | 22P02
> 2 | z \D '2022-07-03ASKL' | id | z
> | invalid input syntax for type integer: "z" |
> | 22P02
> 2 | z \D '2022-07-03ASKL' | ts_value |
> '2022-07-03ASKL' | invalid input syntax for type timestamp:
> "'2022-07-03ASKL'" | | 22007
> 3 | s \D \D | id | s
> | invalid input syntax for type integer: "s" |
> | 22P02
> (4 rows)
>
> The doc is not so good.
>
> COPY FROM (save_error), it will not be as fast as COPY FROM (save_error false).
> With save_error, we can only use InputFunctionCallSafe, which I
> believe is not as fast as InputFunctionCall.
> If any conversion error happens, we need to call the SPI interface,
> that would add more overhead. also we can only insert error cases row
> by row. (maybe we can insert to error_save values(error1), (error2).
> (I will try later)...
>
> The main code is about constructing SPI query, and test and test output.
I reviewed it and have a few questions.

1. I have seen that you delete a table before creating it, to which you
want to add errors due to a failed "copy from" operation. I think this
is wrong because this table can save useful data for the user.
At a minimum, we should warn the user about this, but I think we can
just add some number at the end of the name, such as name_table1,
name_table_2.

2. I noticed that you are forming a table name using the type of errors
that prevent rows from being added during 'copy from' operation.
I think it would be better to use the name of the source file that was
used while 'copy from' was running.
In addition, there may be several such files, it is also worth considering.

3. I found spelling:

/* no err_nsp.error_rel table then crete one. for holding error. */

4. Maybe rewrite this comment

these info need, no error will drop err_nsp.error_rel table
to:
this information is necessary, no error will lead to the deletion of the
err_sp.error_rel table.

5. Is this part of the comment needed? I think it duplicates the
information below when we form the query.

 * . column list(order by attnum, begin from ctid) =
 *    {ctid, lineno,line,field,source,err_message,err_detail,errorcode}
 * . data types (from attnum = -1) ={tid,
int8,text,text,text,text,text,text}

I'm not sure if we need to order the rows by number. It might be easier
to work with these lines in the order they appear.

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikita Malakhov 2023-12-05 10:09:20 Re: Avoid detoast overhead when possible
Previous Message Daniel Gustafsson 2023-12-05 09:43:48 Re: [PoC] Federated Authn/z with OAUTHBEARER