Re: Improvements and additions to COPY progress reporting

From: Josef Šimánek <josef(dot)simanek(at)gmail(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Subject: Re: Improvements and additions to COPY progress reporting
Date: 2021-02-09 08:32:48
Message-ID: CAFp7Qwqa6cQoA29xvCzaTtyu7k=1deAVN62PFr_aKMfa-41E_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

po 8. 2. 2021 v 19:35 odesílatel Matthias van de Meent
<boekewurm+postgres(at)gmail(dot)com> napsal:
>
> Hi,
>
> With [0] we got COPY progress reporting. Before the column names of
> this newly added view are effectively set in stone with the release of
> pg14, I propose the following set of relatively small patches. These
> are v2, because it is a patchset that is based on a set of patches
> that I previously posted in [0].
>
> 0001 Adds a column to pg_stat_progress_copy which details the amount
> of tuples that were excluded from insertion by the WHERE clause of the
> COPY FROM command.
>
> 0002 alters pg_stat_progress_copy to use 'tuple'-terminology instead
> of 'line'-terminology. 'Line' doesn't make sense in the binary copy
> case, and only for the 'text' copy format there can be a guarantee
> that the source / output file actually contains the reported amount of
> lines, whereas the amount of data tuples (which is also what it's
> called internally) is guaranteed to equal for all data types.
>
> There was some discussion about this in [0] where the author thought
> 'line' is more consistent with the CSV documentation, and where I
> argued that 'tuple' is both more consistent with the rest of the
> progress reporting tables and more consistent with the actual counted
> items: these are the tuples serialized / inserted (as noted in the CSV
> docs; "Thus the files are not strictly one line per table row like
> text-format files.").
>
> Patch 0003 adds backlinks to the progress reporting docs from the docs
> of the commands that have progress reporting (re/index, cluster,
> vacuum, etc.) such that progress reporting is better discoverable from
> the relevant commands, and removes the datname column from the
> progress_copy view (that column was never committed). This too should
> be fairly trivial and uncontroversial.
>
> 0004 adds the 'command' column to the progress_copy view; which
> distinguishes between COPY FROM and COPY TO. The two commands are (in
> my opinion) significantly different enough to warrant this column;
> similar to the difference between CREATE INDEX/REINDEX [CONCURRENTLY]
> which also report that information. I believe that this change is
> appropriate; as the semantics of the columns change depending on the
> command being executed.
>
> Lastly, 0005 adds 'io_target' to the reported information, that is,
> FILE, PROGRAM, STDIO or CALLBACK. Although this can relatively easily
> be determined based on the commands in pg_stat_activity, it is
> reasonably something that a user would want to query on, as the
> origin/target of COPY has security and performance implications,
> whereas other options (e.g. format) are less interesting for clients
> that are not executing that specific COPY command.

I took a little deeper look and I'm not sure if I understand FILE and
STDIO. I have finally tried to finalize some initial regress testing
of COPY command progress using triggers. I have attached the initial
patch applicable to your changes. As you can see COPY FROM STDIN is
reported as FILE. That's probably expected, but it is a little
confusing for me since STDIN and STDIO sound similar. What is the
purpose of STDIO? When is the COPY command reported with io_target of
STDIO?

> Of special interest in 0005 is that it reports the io_target for the
> logical replications' initial tablesyncs' internal COPY. This would
> otherwise be measured, but no knowledge about the type of copy (or its
> origin) would be available on the worker's side. I'm not married to
> this patch 0005, but I believe it could be useful, and therefore
> included it in the patchset.
>
>
> With regards,
>
> Matthias van de Meent.
>
>
> [0] https://www.postgresql.org/message-id/flat/CAFp7Qwr6_FmRM6pCO0x_a0mymOfX_Gg%2BFEKet4XaTGSW%3DLitKQ%40mail.gmail.com

Attachment Content-Type Size
v2-0006-initial-regress-testing-of-COPY-progress.patch text/x-patch 4.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2021-02-09 08:33:22 Re: Single transaction in the tablesync worker?
Previous Message tsunakawa.takay@fujitsu.com 2021-02-09 08:30:37 RE: [POC] Fast COPY FROM command for the table with foreign partitions