Re: Improvements and additions to COPY progress reporting

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Josef Šimánek <josef(dot)simanek(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Subject: Re: Improvements and additions to COPY progress reporting
Date: 2021-02-22 04:49:32
Message-ID: CALj2ACUyNREth8f3M7wXrHVeycfnqBn5pVygYOoBVs=ifo8V4A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 22, 2021 at 12:40 AM Matthias van de Meent
<boekewurm+postgres(at)gmail(dot)com> wrote:
>
> On Sat, 20 Feb 2021 at 07:09, Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > For COPY TO the name "source_type" column and for COPY FROM the name
> > "destination_type" makes sense. To have a combined column name for
> > both, how about naming that column as "io_type"?
>
> Thank you, that's way better! PFA what I believe is a finalized
> patchset v9, utilizing io_type terminology instead of io_target.

Thanks for the patches. I reviewed them.

0001 - I think there's a bug. See COPY TO stdout doesn't print
io_type as "STDIO".

postgres=# select * from pg_stat_progress_copy;
pid | datid | datname | relid | command | io_type |
bytes_processed | bytes_total | tuples_processed | tuples_excluded
--------+-------+----------+-------+---------+---------+-----------------+-------------+------------------+-----------------
977510 | 13003 | postgres | 16384 | COPY TO | |
23961591 | 0 | 2662399 | 0
(1 row)

we should do below(like we do for copyfrom.c):

@@ -702,7 +710,10 @@ BeginCopyTo(ParseState *pstate,
if (pipe)
{
progress_vals[1] = PROGRESS_COPY_IO_TYPE_STDIO;
Assert(!is_program); /* the grammar does not allow this */
if (whereToSendOutput != DestRemote)
cstate->copy_file = stdout;
}

Because if "pipe" is true, that means the transfer is between STDIO.
See below comment:

* If <pipe> is false, transfer is between the table and the file named
* <filename>. Otherwise, transfer is between the table and our regular
* input/output stream. The latter could be either stdin/stdout or a
* socket, depending on whether we're running under Postmaster control.
*

0002 patch looks good to me.

0003 - patch:
I'm doubtful if the "bytes_total": 79 i.e. test file size will be the
same across different platforms and file systems types, if true, then
the below tests will not be stable. Do we also want to exclude the
bytes_total from the output, just to be on the safer side? Thoughts?

copy progress_reporting from stdin;
+INFO: progress: {"command": "COPY FROM", "datname": "regression",
"io_type": "STDIO", "bytes_total": 0, "bytes_processed": 79,
"tuples_excluded": 0, "tuples_processed": 3}
+-- reporting of FILE imports, and correct reporting of tuples-excluded
+copy progress_reporting from '@abs_srcdir@/data/emp.data'
+ where (salary < 2000);
+INFO: progress: {"command": "COPY FROM", "datname": "regression",
"io_type": "FILE", "bytes_total": 79, "bytes_processed": 79,
"tuples_excluded": 1, "tuples_processed": 2}
+-- cleanup progress_reporting

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2021-02-22 05:05:26 Re: [PATCH]: Allow errors in parameter values to be reported during the BIND phase itself..
Previous Message tsunakawa.takay@fujitsu.com 2021-02-22 04:30:27 RE: libpq debug log