Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

From: Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>
To: jian he <jian(dot)universality(at)gmail(dot)com>
Cc: Damir Belyalov <dam(dot)bel07(at)gmail(dot)com>, zhihuifan1213(at)163(dot)com, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Gustafsson <daniel(at)yesql(dot)se>, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, anisimow(dot)d(at)gmail(dot)com, HukuToc(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Subject: Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Date: 2023-12-11 14:05:33
Message-ID: f03bc3de-9769-4106-80cf-158c5f184a1e@yandex.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi! Thank you for your work. Your patch looks better!

On 10.12.2023 13:32, jian he wrote:
> On Fri, Dec 8, 2023 at 3:09 PM Alena Rybakina<lena(dot)ribackina(at)yandex(dot)ru> wrote:
>> Thank you for your work. Unfortunately, your code contained errors during the make installation:
>>
>> 'SAVEPOINT' after 'SAVE_ERROR' in unreserved_keyword list is misplaced
>> 'SAVEPOINT' after 'SAVE_ERROR' in bare_label_keyword list is misplaced
>> make[2]: *** [../../../src/Makefile.global:783: gram.c] Error 1
>> make[1]: *** [Makefile:131: parser/gram.h] Error 2
>> make[1]: *** Waiting for unfinished jobs....
>> make: *** [src/Makefile.global:383: submake-generated-headers] Error 2
>>
>> I have ubuntu 22.04 operation system.
>>
>> On 06.12.2023 13:47, jian he wrote:
>>
>> On Tue, Dec 5, 2023 at 6:07 PM Alena Rybakina<lena(dot)ribackina(at)yandex(dot)ru> wrote:
>>
>> Hi!
>>
>> Thank you for your contribution to this thread.
>>
>>
>> I reviewed it and have a few questions.
>>
>> 1. I have seen that you delete a table before creating it, to which you want to add errors due to a failed "copy from" operation. I think this is wrong because this table can save useful data for the user.
>> At a minimum, we should warn the user about this, but I think we can just add some number at the end of the name, such as name_table1, name_table_2.
>>
>> Sorry. I don't understand this part.
>> Currently, if the error table name already exists, then the copy will
>> fail, an error will be reported.
>> I try to first create a table, if no error then the error table will be dropped.
>>
>> To be honest, first of all, I misunderstood this part of the code. Now I see that it works the way you mentioned.
>>
>> However, I didn't see if you dealt with cases where we already had a table with the same name as the table error.
>> I mean, when is he trying to create for the first time, or will we never be able to face such a problem?
>>
>> Can you demo the expected behavior?
>>
>> Unfortunately, I was unable to launch it due to a build issue.
>>
> Hopefully attached will work.

Yes, thank you! It works fine, and I see that the regression tests have
been passed. 🙂

However, when I ran 'copy from with save_error' operation with simple
csv files (copy_test.csv, copy_test1.csv) for tables test, test1 (how I
created it, I described below):

postgres=# create table test (x int primary key, y int not null);
postgres=# create table test1 (x int, z int, CONSTRAINT fk_x
      FOREIGN KEY(x)
          REFERENCES test(x));

I did not find a table with saved errors after operation, although I
received a log about it:

postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV
save_error
NOTICE:  2 rows were skipped because of error. skipped row saved to
table public.test_error
ERROR:  duplicate key value violates unique constraint "test_pkey"
DETAIL:  Key (x)=(2) already exists.
CONTEXT:  COPY test, line 3

postgres=# select * from public.test_error;
ERROR:  relation "public.test_error" does not exist
LINE 1: select * from public.test_error;

postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ','
CSV save_error
NOTICE:  2 rows were skipped because of error. skipped row saved to
table public.test1_error
ERROR:  insert or update on table "test1" violates foreign key
constraint "fk_x"
DETAIL:  Key (x)=(2) is not present in table "test".

postgres=# select * from public.test1_error;
ERROR:  relation "public.test1_error" does not exist
LINE 1: select * from public.test1_error;

Two lines were written correctly in the csv files, therefore they should
have been added to the tables, but they were not added to the tables
test and test1.

If I leave only the correct rows, everything works fine and the rows are
added to the tables.

in copy_test.csv:

2,0

1,1

in copy_test1.csv:

2,0

2,1

1,1

postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV
COPY 2
postgres=# \copy test1 from '/home/alena/copy_test1.csv' DELIMITER ','
CSV save_error
NOTICE:  No error happened.Error holding table public.test1_error will
be droped
COPY 3

Maybe I'm launching it the wrong way. If so, let me know about it.

I also notice interesting behavior if the table was previously created
by the user. When I was creating an error_table before the 'copy from'
operation,
I received a message saying that it is impossible to create a table with
the same name (it is shown below) during the 'copy from' operation.
I think you should add information about this in the documentation,
since this seems to be normal behavior to me.

postgres=# CREATE TABLE test_error (LINENO BIGINT, LINE TEXT,
FIELD TEXT, SOURCE TEXT, ERR_MESSAGE TEXT,
ERR_DETAIL TEXT, ERRORCODE TEXT);
CREATE TABLE
postgres=# \copy test from '/home/alena/copy_test.csv' DELIMITER ',' CSV
save_error
ERROR:  Error save table public.test_error already exists. Cannot use it
for COPY FROM error saving
>
>> 2. I noticed that you are forming a table name using the type of errors that prevent rows from being added during 'copy from' operation.
>> I think it would be better to use the name of the source file that was used while 'copy from' was running.
>> In addition, there may be several such files, it is also worth considering.
>>
>> Another column added.
>> now it looks like:
>>
>> SELECT * FROM save_error_csv_error;
>> filename | lineno | line
>> | field | source | err_message |
>> err_detail | errorcode
>> ----------+--------+----------------------------------------------------+-------+--------+---------------------------------------------+------------+-----------
>> STDIN | 1 | 2002 232 40 50 60 70
>> 80 | NULL | NULL | extra data after last expected column |
>> NULL | 22P04
>> STDIN | 1 | 2000 230 23
>> | d | NULL | missing data for column "d" | NULL
>> | 22P04
>> STDIN | 1 | z,,""
>> | a | z | invalid input syntax for type integer: "z" | NULL
>> | 22P02
>> STDIN | 2 | \0,,
>> | a | \0 | invalid input syntax for type integer: "\0" | NULL
>> | 22P02
>>
>> Yes, I see the "filename" column, and this will solve the problem, but "STDIN" is unclear to me.
> please see comment in struct CopyFromStateData:
> char *filename; /* filename, or NULL for STDIN */
>
Yes, I can see that.

I haven't figured out how to fix it yet either.

>> */
>>
>> Maybe we can rewrite it like this:
>>
>> /* Check, the err_nsp.error_rel table has already existed
>> * and if it is, check its column name and data types.
>>
> refactored.

Fine)

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
copy_test1.csv text/csv 26 bytes
copy_test.csv text/csv 23 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-12-11 14:24:25 Re: [CAUTION!! freemail] Re: Partial aggregates pushdown
Previous Message Amit Kapila 2023-12-11 13:42:39 Re: Synchronizing slots from primary to standby