Re: [PATCH] Initial progress reporting for COPY command

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Josef Šimánek <josef(dot)simanek(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Subject: Re: [PATCH] Initial progress reporting for COPY command
Date: 2020-06-23 10:10:08
Message-ID: CALDaNm1wePVSpgGVTT628mLxZg51yBYzyBJcPPhVeKg7-hPF=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 22, 2020 at 4:28 PM Josef Šimánek <josef(dot)simanek(at)gmail(dot)com> wrote:
>
> Thanks for the hint regarding "CopyReadLineText". I'll take a look.
>
> For now I have tested those cases:
>
> CREATE TABLE test(id int);
> INSERT INTO test SELECT 1 FROM generate_series(1, 1000000);
> COPY (SELECT * FROM test) TO '/tmp/ids';
> COPY test FROM '/tmp/ids';
>
> psql -h /tmp yr -c 'COPY (SELECT 1 from generate_series(1,100000000)) TO STDOUT;' > /tmp/ryba.txt
> echo /tmp/ryba.txt | psql -h /tmp yr -c 'COPY test FROM STDIN'
>
> It is easy to check lines count and bytes count are in sync (since 1 line is 2 bytes here - "1" and newline character).
> I'll try to check more complex COPY commands to ensure everything is in sync.
>
> If you have any ideas for testing queries, feel free to suggest.

For copy from statement you could attach the session, put a breakpoint
at CopyReadLineText, execution will hit this breakpoint for every
record it is doing COPY FROM and parallely check if
pg_stat_progress_copy is getting updated correctly. I noticed it was
showing the file read size instead of the actual processed bytes.

>> +pg_stat_progress_copy| SELECT s.pid,
>> + s.datid,
>> + d.datname,
>> + s.relid,
>> + CASE s.param1
>> + WHEN 0 THEN 'TO'::text
>> + WHEN 1 THEN 'FROM'::text
>> + ELSE NULL::text
>> + END AS direction,
>> + ((s.param2)::integer)::boolean AS file,
>> + ((s.param3)::integer)::boolean AS program,
>> + s.param4 AS lines_processed,
>> + s.param5 AS file_bytes_processed
>>
>> You could include pg_size_pretty for s.param5 like
>> pg_size_pretty(S.param5) AS bytes_processed, it will be easier for
>> users to understand bytes_processed when the data size increases.
>
>
> I was looking at the rest of reporting views and for me those seem to be just basic ones providing just raw data to be used later in custom nice friendly human-readable views built on the client side.
> For example "pg_stat_progress_basebackup" also reports "backup_streamed" in raw form.
>
> Anyway if you would like to make this view more user-friendly, I can add that. Just ping me.

I felt we could add pg_size_pretty to make the view more user friendly.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-06-23 10:18:31 Re: Resetting spilled txn statistics in pg_stat_replication
Previous Message Amit Kapila 2020-06-23 09:56:58 Re: EXPLAIN: Non-parallel ancestor plan nodes exclude parallel worker instrumentation