Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: Damir Belyalov <dam(dot)bel07(at)gmail(dot)com>, zhihuifan1213(at)163(dot)com, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Gustafsson <daniel(at)yesql(dot)se>, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, anisimow(dot)d(at)gmail(dot)com, HukuToc(at)gmail(dot)com, Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Date: 2023-12-04 02:23:30
Message-ID: CACJufxGBb-WE7R1fM4kJxvmsgsyKL_4vce3s_KxLbo_ZEnFnWw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

hi.
here is my implementation based on previous discussions

add a new COPY FROM flag save_error.
save_error only works with non-BINARY flags.
save_error is easier for me to implement, if using "save error" I
worry, 2 words, gram.y will not work.
save_error also works other flag like {csv mode, force_null, force_not_null}

overall logic is:
if save_error is specified then
if error_holding table not exists then create one
if error_holding table exists set error_firsttime to false.
if save_error is not specified then work as master branch.

if errors happen then insert error info to error_holding table.
if errors do not exist and error_firsttime is true then drop the table.
if errors do not exist and error_firsttime is false then raise a
notice: All the past error holding saved at %s.%s

error holding table:
schema will be the same as COPY destination table.
the table name will be: COPY destination name concatenate with "_error".

error_holding table definition:
CREATE TABLE err_nsp.error_rel (LINENO BIGINT, LINE TEXT,
FIELD TEXT, SOURCE TEXT, ERR_MESSAGE TEXT,
ERR_DETAIL TEXT, ERRORCODE TEXT);

the following field is not implemented.
FIELDS text[], separated, de-escaped string fields (the data that was
or would be fed to input functions)

because imagine following case:
create type test as (a int, b text);
create table copy_comp (c1 int, c2 test default '(11,test)', c3 date);
copy copy_comp from stdin with (default '\D');
1 \D '2022-07-04'
\.
table copy_comp;

I feel it's hard from textual '\D' to get text[] `(11,test)` via SPI.
--------------------------------------
demo:

create table copy_default_error_save (
id integer,
text_value text not null default 'test',
ts_value timestamp without time zone not null default '2022-07-05'
);
copy copy_default_error_save from stdin with (save_error, default '\D');
k value '2022-07-04'
z \D '2022-07-03ASKL'
s \D \D
\.

NOTICE: 3 rows were skipped because of error. skipped row saved to
table public.copy_default_error_save_error
select * from copy_default_error_save_error;
lineno | line | field | source
| err_message |
err_detail | errorcode
--------+----------------------------------+----------+------------------+-------------------------------------------------------------+------------+-----------
1 | k value '2022-07-04' | id | k
| invalid input syntax for type integer: "k" |
| 22P02
2 | z \D '2022-07-03ASKL' | id | z
| invalid input syntax for type integer: "z" |
| 22P02
2 | z \D '2022-07-03ASKL' | ts_value |
'2022-07-03ASKL' | invalid input syntax for type timestamp:
"'2022-07-03ASKL'" | | 22007
3 | s \D \D | id | s
| invalid input syntax for type integer: "s" |
| 22P02
(4 rows)

The doc is not so good.

COPY FROM (save_error), it will not be as fast as COPY FROM (save_error false).
With save_error, we can only use InputFunctionCallSafe, which I
believe is not as fast as InputFunctionCall.
If any conversion error happens, we need to call the SPI interface,
that would add more overhead. also we can only insert error cases row
by row. (maybe we can insert to error_save values(error1), (error2).
(I will try later)...

The main code is about constructing SPI query, and test and test output.

Attachment Content-Type Size
v8-0001-Add-a-new-COPY-option-SAVE_ERROR.patch text/x-patch 37.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2023-12-04 02:46:48 connection timeout hint
Previous Message li jie 2023-12-04 02:05:58 Re: Proposal: Filter irrelevant change before reassemble transactions during logical decoding