Improvements and additions to COPY progress reporting

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Josef Šimánek <josef(dot)simanek(at)gmail(dot)com>
Subject: Improvements and additions to COPY progress reporting
Date: 2021-02-08 18:35:45
Message-ID: CAEze2WiOcgdH4aQA8NtZq-4dgvnJzp8PohdeKchPkhMY-jWZXA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

With [0] we got COPY progress reporting. Before the column names of
this newly added view are effectively set in stone with the release of
pg14, I propose the following set of relatively small patches. These
are v2, because it is a patchset that is based on a set of patches
that I previously posted in [0].

0001 Adds a column to pg_stat_progress_copy which details the amount
of tuples that were excluded from insertion by the WHERE clause of the
COPY FROM command.

0002 alters pg_stat_progress_copy to use 'tuple'-terminology instead
of 'line'-terminology. 'Line' doesn't make sense in the binary copy
case, and only for the 'text' copy format there can be a guarantee
that the source / output file actually contains the reported amount of
lines, whereas the amount of data tuples (which is also what it's
called internally) is guaranteed to equal for all data types.

There was some discussion about this in [0] where the author thought
'line' is more consistent with the CSV documentation, and where I
argued that 'tuple' is both more consistent with the rest of the
progress reporting tables and more consistent with the actual counted
items: these are the tuples serialized / inserted (as noted in the CSV
docs; "Thus the files are not strictly one line per table row like
text-format files.").

Patch 0003 adds backlinks to the progress reporting docs from the docs
of the commands that have progress reporting (re/index, cluster,
vacuum, etc.) such that progress reporting is better discoverable from
the relevant commands, and removes the datname column from the
progress_copy view (that column was never committed). This too should
be fairly trivial and uncontroversial.

0004 adds the 'command' column to the progress_copy view; which
distinguishes between COPY FROM and COPY TO. The two commands are (in
my opinion) significantly different enough to warrant this column;
similar to the difference between CREATE INDEX/REINDEX [CONCURRENTLY]
which also report that information. I believe that this change is
appropriate; as the semantics of the columns change depending on the
command being executed.

Lastly, 0005 adds 'io_target' to the reported information, that is,
FILE, PROGRAM, STDIO or CALLBACK. Although this can relatively easily
be determined based on the commands in pg_stat_activity, it is
reasonably something that a user would want to query on, as the
origin/target of COPY has security and performance implications,
whereas other options (e.g. format) are less interesting for clients
that are not executing that specific COPY command.

Of special interest in 0005 is that it reports the io_target for the
logical replications' initial tablesyncs' internal COPY. This would
otherwise be measured, but no knowledge about the type of copy (or its
origin) would be available on the worker's side. I'm not married to
this patch 0005, but I believe it could be useful, and therefore
included it in the patchset.

With regards,

Matthias van de Meent.

[0] https://www.postgresql.org/message-id/flat/CAFp7Qwr6_FmRM6pCO0x_a0mymOfX_Gg%2BFEKet4XaTGSW%3DLitKQ%40mail.gmail.com

Attachment Content-Type Size
v2-0005-Add-a-io_target-column-to-the-copy-progress-view.patch text/x-patch 6.3 KB
v2-0002-Rename-lines-to-tuples-in-COPY-progress-reporting.patch text/x-patch 5.7 KB
v2-0001-Add-progress-reporting-for-excluded-rows.patch text/x-patch 4.1 KB
v2-0003-Add-backlinks-to-progress-reporting-documentation.patch text/x-patch 4.4 KB
v2-0004-Add-a-command-column-to-the-copy-progress-view.patch text/x-patch 4.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2021-02-08 18:40:50 small test case for abbrev(cidr)
Previous Message Jacob Champion 2021-02-08 18:29:10 Re: Allow matching whole DN from a client certificate