Re: Improvements and additions to COPY progress reporting

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Josef Šimánek <josef(dot)simanek(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Subject: Re: Improvements and additions to COPY progress reporting
Date: 2021-02-24 03:12:14
Message-ID: CALj2ACWy248FaqacZNWVxx6y+_xUYX2+p0V_ht8YeTo1tP1Ytw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 23, 2021 at 2:57 PM Matthias van de Meent
<boekewurm+postgres(at)gmail(dot)com> wrote:
> On Mon, 22 Feb 2021 at 05:49, Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > On Mon, Feb 22, 2021 at 12:40 AM Matthias van de Meent
> > <boekewurm+postgres(at)gmail(dot)com> wrote:
> > >
> > > On Sat, 20 Feb 2021 at 07:09, Bharath Rupireddy
> > > <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > > >
> > > > For COPY TO the name "source_type" column and for COPY FROM the name
> > > > "destination_type" makes sense. To have a combined column name for
> > > > both, how about naming that column as "io_type"?
> > >
> > > Thank you, that's way better! PFA what I believe is a finalized
> > > patchset v9, utilizing io_type terminology instead of io_target.
> >
> > Thanks for the patches. I reviewed them.
> >
> > 0001 - I think there's a bug. See COPY TO stdout doesn't print
> > io_type as "STDIO".
>
> Fixed in attached

Thanks.

> > 0003 - patch:
> > I'm doubtful if the "bytes_total": 79 i.e. test file size will be the
> > same across different platforms and file systems types, if true, then
> > the below tests will not be stable. Do we also want to exclude the
> > bytes_total from the output, just to be on the safer side? Thoughts?
>
> I'm fairly certain that input files of the regression tests are
> considered 'binary files' to the test framework and that contents
> don't change between different architectures or OSes. I also think
> that any POSIX-compliant file system would report anything but the
> size of the file contents, i.e. the size of the blob that is the file,
> and that is correctly reported here. Other than that, if bytes_total
> wouldn't be stable, then bytes_processed wouldn't make sense either.
>
> For STDIN / STDOUT you might also have a point (different input
> methods might have different length encodings for the specified
> input), but insofar that I understand the test framework and the
> expected git configurations, the tests run using UTF-8 / ascii only,
> with a single style of newlines[+]. Sadly, I cannot provide examples
> nor outputs for other test framework settings due to my lack of
> experience with running the tests with non-standard settings.
>
> Note, I'm happy to be proven wrong here, in which case I don't
> disagree, but according to my limited knowledge, these outputs should
> be stable.

I'm no expert in different OS architectures, but I see that the
patches are passing on cf bot where they get tested on Windows,
FreeBSD, Linux and macOS platforms.

I have no further comments on the v10 patch set. I tested the patches,
they work as expected. I will mark the cf entry as "Ready for
Committer".

Below are some snapshots from testing:

postgres=# select * from pg_stat_progress_copy;
pid | datid | datname | relid | command | io_type |
bytes_processed | bytes_total | tuples_processed | tuples_excluded
---------+-------+----------+-------+-----------+---------+-----------------+-------------+------------------+-----------------
1089927 | 13003 | postgres | 16384 | COPY FROM | FILE |
104660992 | 888888898 | 12861112 | 0
1089969 | 13003 | postgres | 16384 | COPY FROM | FILE |
76611584 | 888888898 | 9712999 | 0

postgres=# select * from pg_stat_progress_copy;
pid | datid | datname | relid | command | io_type |
bytes_processed | bytes_total | tuples_processed | tuples_excluded
---------+-------+----------+-------+-----------+---------+-----------------+-------------+------------------+-----------------
1089927 | 13003 | postgres | 16384 | COPY FROM | FILE |
203161600 | 888888898 | 0 | 23804080
1089969 | 13003 | postgres | 16384 | COPY FROM | FILE |
150601728 | 888888898 | 0 | 17961241

postgres=# select * from pg_stat_progress_copy;
pid | datid | datname | relid | command | io_type |
bytes_processed | bytes_total | tuples_processed | tuples_excluded
---------+-------+----------+-------+---------+---------+-----------------+-------------+------------------+-----------------
1089927 | 13003 | postgres | 16384 | COPY TO | FILE |
66806479 | 0 | 7422942 | 0
1089969 | 13003 | postgres | 16384 | COPY TO | FILE |
29803951 | 0 | 3311550 | 0

postgres=# select * from pg_stat_progress_copy;
pid | datid | datname | relid | command | io_type |
bytes_processed | bytes_total | tuples_processed | tuples_excluded
---------+-------+----------+-------+---------+---------+-----------------+-------------+------------------+-----------------
1089927 | 13003 | postgres | 16384 | COPY TO | STDIO |
5998293 | 0 | 666477 | 0
1089969 | 13003 | postgres | 16384 | COPY TO | STDIO |
2780586 | 0 | 308954 | 0

postgres=# select * from pg_stat_progress_copy;
pid | datid | datname | relid | command | io_type |
bytes_processed | bytes_total | tuples_processed | tuples_excluded
---------+-------+----------+-------+---------+---------+-----------------+-------------+------------------+-----------------
1089927 | 13003 | postgres | 0 | COPY TO | FILE |
124447239 | 0 | 13827471 | 0
1089969 | 13003 | postgres | 0 | COPY TO | FILE |
90992466 | 0 | 10110274 | 0

postgres=# select * from pg_stat_progress_copy;
pid | datid | datname | relid | command | io_type |
bytes_processed | bytes_total | tuples_processed | tuples_excluded
---------+-------+----------+-------+-----------+---------+-----------------+-------------+------------------+-----------------
1090927 | 13003 | postgres | 16384 | COPY FROM | STDIO |
492465897 | 0 | 55952999 | 0
1091000 | 13003 | postgres | 16384 | COPY FROM | STDIO |
30494360 | 0 | 3950683 | 0

postgres=# select * from pg_stat_progress_copy;
pid | datid | datname | relid | command | io_type |
bytes_processed | bytes_total | tuples_processed | tuples_excluded
---------+-------+----------+-------+-----------+---------+-----------------+-------------+------------------+-----------------
1091217 | 13003 | postgres | 16384 | COPY FROM | STDIO |
230516127 | 0 | 0 | 26847469
1091224 | 13003 | postgres | 16384 | COPY FROM | STDIO |
212020065 | 0 | 0 | 24792351

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kohei KaiGai 2021-02-24 03:18:24 contrib/cube - binary input/output handlers
Previous Message Greg Nancarrow 2021-02-24 03:11:21 Re: Parallel INSERT (INTO ... SELECT ...)